[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Austin Nichols" <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Variance estimation with clusters |

Date |
Thu, 8 Nov 2007 11:00:26 -0500 |

Steven makes some good points. I have a slightly different take: 1. Use the -fpc- option, but understand what it means. Imagine you "sampled" w/o replacement 100% of establishments and workers in a population; with the fpc's, all standard errors would be zero. This is as it should be; the svy SEs in a regression using the population are zero, because svy SEs represent deviations around the population value (not Fisher-Neyman notions of deviations about what might have been observed in the population with a different random sprinkling of regressors on individuals). 2. svy + panel = trouble. If you want to run a fixed-effect regression, consider -areg- which allows pweights that vary over time and a -cluster- option. 3. I would use the time-specific weights which measure the number of person-years each observation represents in the population of workers in the two years. The population is then not people, but people*time. On 11/8/07, Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> wrote: > -- > > Maury: > > I would would only add to Austin's good advice: > > 1. If you are doing regressions and hypothesis tests, do not use the > fpc terms. Imagined you had studied 100% of establishments and > workers in a population; with the fpc's, all standard errors would be > zero. > > 2. Stata's panel data and multi-level model -xt- commands will not > respond to -svyset-. For panel data analysis, the options > accommodating the survey design vary by command. > > 3. You should probably use the survey weights from year 1; but the > study documentation may have other advice. Obviously these weights > will not sum to the population size in either year 1 or year 2. If > the survey deliberately over-sampled a class of workers which is the > subject of your analysis (e.g. you wish to compare a minority to a > majority group, and the survey over-sampled the minority group), you > should probably ignore the survey weights altogether. > > -Steven > > On Nov 8, 2007, at 10:16 AM, Austin Nichols wrote: > > > Maury Gittleman <Gittleman.Maury@bls.gov>: > > Just clustering on establishment is probably sufficient. > > > > You can also specify two levels of clustering with -svyset- e.g. > > > > webuse stage5a > > svyset su1 [pweight=pw], fpc(fpc1) || su2 > > > > where su1 is your establishment id, fpc1 the number of distinct > > employees in both years, and su2 is a person id. > > > > Usually the second level of clustering is largely irrelevant. But > > not always... > > > > svyset su1 [pweight=pw], fpc(fpc1) strat(strat) > > svy: reg yreg x? > > est sto c1lev > > svyset su1 [pw=pw], fpc(fpc1) str(str) || su2, fpc(fpc2) > > svy: reg yreg x? > > est sto c2lev > > esttab *, mti > > > > > > On 11/8/07, Gittleman, Maury - BLS <Gittleman.Maury@bls.gov> wrote: > >> Hello, > >> > >> I'm have a question concerning stata's approach to estimating > >> standard > >> errors in the presence of clustered survey data. The survey I'm > >> using > >> collects information on individual wages, by first selecting > >> establishments at random, and then collecting information on multiple > >> workers within each establishment. So, it is clear that, when I'm > >> running regressions, I need to cluster on establishment. > >> > >> My question arises when I use two years of data from the same survey. > >> For about 4/5 of the individuals, there will be data for two > >> years, and > >> I would expect that the correlation between the errors for any given > >> individual will be higher than the correlation between the errors for > >> two different individuals at the same establishment. My thinking is > >> that I still want to define clusters by establishments, as the > >> variance > >> estimation is said to be robust to any arbitrary intra-cluster > >> correlation. > >> > >> Is this the right way to go or is there an alternative approach that > >> might be superior? > >> > >> Thanks very much. > >> > >> Maury > > * > > * For searches and help try: > > * http://www.stata.com/support/faqs/res/findit.html > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > Steven Samuels > > sjhsamuels@earthlink.net > 18 Cantine's Island > Saugerties, NY 12477 > Phone: 845-246-0774 > EFax: 208-498-7441 > > > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Variance estimation with clusters***From:*Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>

**st: Majordomo results - repeated emails***From:*Johannes Geyer <JGeyer@diw.de>

**References**:**st: How to obtain fit indices from a logistic regression model in Stata?***From:*tiago.pereira@incor.usp.br

**Re: st: How to obtain fit indices from a logistic regression model in Stata?***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**st: Variance estimation with clusters***From:*"Gittleman, Maury - BLS" <Gittleman.Maury@bls.gov>

**Re: st: Variance estimation with clusters***From:*"Austin Nichols" <austinnichols@gmail.com>

**Re: st: Variance estimation with clusters***From:*Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>

- Prev by Date:
**Majordomo results: st: Majordomo results - repeated emails** - Next by Date:
**Majordomo results: Re: st: Variance estimation with cluster** - Previous by thread:
**Re: st: Variance estimation with clusters** - Next by thread:
**st: Majordomo results - repeated emails** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |