[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Clustering of secondary units in sampling design |

Date |
Mon, 5 Oct 2009 11:17:24 -0500 |

Stata uses an approximation to compute the variance estimator. You can technically ignore the subsequent stages if you select PSUs with replacement. As a rule, the actual samples are done without replacement to increase precision (due to finite population correction). You can provide Stata with a more accurate design specification if you know the unit sizes and can supply them with -fpc()- option; then Stata will start treating the stages for which -fpc()- is provided as WOR. Also, there is a difference between variance and its estimator. For instance, in two-stage sampling, you will have two terms in your variance expression: total variance = V1[E2(statistic|stage1)] + E1[V2(statistic|stage1)] where the first term involves the cluster means/totals, the second term involves cluster variances, and the first term is typically larger than the second. The variance estimator formula based on the ultimate cluster at the first level wraps two of these terms together, and shows the variances of the cluster means. What you do in your design work is to use the theoretical formula with the population variances (known or accurately estimated from census or CPS data, for instance). While they may not show up in the estimator of variance, you will see the differences in different designs if you run simulations. The two designs you are suggesting will probably have variances that differ by about 2-5-10%, and that is probably comparable with the bias of the variance estimator. (This also means that the approximation that Stata uses works better for some designs than others; in their textbook, Korn & Graubard <http://www.citeulike.org/user/ctacmo/article/553280> give some numeric examples to show where the variance estimator based on the ultimate clusters makes its hits and misses.) 2009/10/5 Ángel Rodríguez Laso <angelrlaso@gmail.com>: > Dear Statalisters, > > When analysing multistage survey data, Stata only needs a Primary > Sampling Unit variable in the dataset, because the contribution to > variance of any further clustering is incorporated to the PSU > variance. This is based on the 'ultimate cluster method' for variance > calculation. > > Does it mean that, when designing a sample, the number of Secondary > Sampling Units is irrelevant for standard errors calculation purposes? > That would mean that if, for example, one selects first municipalities > (PSUs), then census tracts (SSUs) within municipalities and then > individuals within census tracts, it is the same to go for a design > with one census tract with 100 individuals per municipality than to go > for 10 census tracts with 10 individuals each per municipality, > although the second way increases the variability in the sample. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Clustering of secondary units in sampling design***From:*Ángel Rodríguez Laso <angelrlaso@gmail.com>

- Prev by Date:
**Re: st: RE: Keep subsequent observations in panel data** - Next by Date:
**st: RE: Quoted quotes solved** - Previous by thread:
**st: Clustering of secondary units in sampling design** - Next by thread:
**Re: Re: st: Clustering of secondary units in sampling design** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |