Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: SVY question |

Date |
Sun, 28 Aug 2011 16:39:01 -0400 |

- That's clear, Dmitris. So ignore stratification at the worker stage. **************************************** egen stratum = group(firm_size x_firm) svyset firm_id [pweight = myweight] , strata(stratum) *************************************** You must then construct your sampling weight as the product of two components: myweight = (1/p1)x(1/p2). 1. p1 = Pr of select a firm = 30/(no. of firms in the firm's stratum) 2. p2 = Pr of select a worker = (no. of workers/no. eligible), where the choices were made separately in each firm according to the selection plan you describe. To check how well the estimated workers in each category matches the known numbers you should run ****************** svy: tab firm_size x_firm, cell ****************** Unfortunately, I believe that with this design, the numbers will not match well, with the bias possibly towards smaller firms. If this occurs, use the poststrata() and postweight() options in-svyset- to match the known numbers. If you have information on individual firm sizes in the population, you can do better. Create a firm-size variable with more categories (e.g. 5-24, 25-99, 100-199, 200-299....). Estimate numbers of workers in the population by refined firm size category and firm_x and compare to the known numbers. If these differ, as I suspect they will, the post-stratify on these numbers instead. If you intend to do longitudinal analyses with -xtmixed- in Stata 12, then you must compute the post-stratified probability weights yourself. Suppose the weighted proportion of workers in post-stratum k is p_k and that the proportion of workers in the population in stratum k is P_k, then create a new weight as new_weight = myweight*P_k/p_k. Steve On Aug 27, 2011, at 4:50 PM, Pavlopoulos, D. wrote: Dear Steve, thank you for your reply. Below you can find a description of my sampling: - I select companies that do not have workers using the arrangement X. I exclude companies with less than 5 workers. - I split the companies according to their size in groups having 5-24, 25-99, 99-499 and 500+ workers - Within each of these groups I select a random sample of 30 companies. - I select companies that have at least one worker using the arrangement X. I exclude companies with less than 5 workers. - I split the companies according to their size in groups having 5-24, 25-99, 99-499 and 500+ workers - Within each of these groups I select a random sample of 30 companies.

**References**:**st: SVY question***From:*"Pavlopoulos, D." <d.pavlopoulos@vu.nl>

**Re: st: SVY question***From:*Steven Samuels <sjsamuels@gmail.com>

**RE: st: SVY question***From:*"Pavlopoulos, D." <d.pavlopoulos@vu.nl>

- Prev by Date:
**re: st: xtivreg2 doesn't like lagged endogenous RHS variables?** - Next by Date:
**re: st: xtivreg2 doesn't like lagged endogenous RHS variables?** - Previous by thread:
**RE: st: SVY question** - Next by thread:
**Re: st: SVY question** - Index(es):