Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: svyset, fpc command and stratified cluster sampling |
Date | Fri, 12 Mar 2010 14:20:18 -0600 |
On Fri, Mar 12, 2010 at 12:56 AM, Pierre DeBeaudrap <pdebeaudrap@gmail.com>wrote: > I plan to analyse data from a community survey with cluster sampling and I > am not completely sure of the way to setup setsvy. > > The study design is a stratified two stage cluster sampling. The strata > were > urban and rural, then villages (psu) were sampled with a probability > proportional to the number of households. In the second stage, 30 > households > were randomly selected per village. > > We don't know the population size of the villages but we know the > population > size of each stratum and these figures are more reliable than the number of > housholds. > > Svyset village [pw=proba], strata(urban) fpc( ) > > For fpc do I have to enter the number of clusters per strata or the > population size for each strata. > The finite population corrections are only applicable when you use SRS. This is a fine print in complex survey data analysis that is often not understood. Ideally, you'd want to construct Horvitz-Thompson estimator with appropriate variance estimates computed with pairwise probabilities of selection. Realistically, you'd have to assume sampling with replacement and ignore fpc, or assume SRS (which is obviously a long stretch for your design) and enter #of PSUs in the population for your fpc. # of villages is probably fine for the rural part, and I would venture a guess that the urban stratum was sampled in a different way. Is there any further information I can provide for the level 2? > If you are willing to assume sampling with replacement, you don't need to provide any additional information. It won't even be used in estimation. Interestingly, you should not worry too much about the weights. If the measure of size that your sampling people used (estimated # of households per PSU) is at least approximately right, you will have a self-weighting scheme that results in equal probabilities of selection. So your weights should be approximately constant. That's a pretty good design, actually. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/