[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Variance estimation with clusters |

Date |
Thu, 8 Nov 2007 14:24:11 -0500 |

---

I really like Austin's idea of using weights that represent person- years. Either weighting scheme will run into trouble if the Probability that a worker is observed in year 2, given observation year 1, is correlated with analysis variables. Suppose, for example, one is studying occupational health. If some workers leave their jobs before the year 2 survey because of health problems, those with two years of data will be healthier. This is the well-known 'healthy worker effect'.

-Steven

On Nov 8, 2007, at 11:00 AM, Austin Nichols wrote:

Steven makes some good points. I have a slightly different take:

1. Use the -fpc- option, but understand what it means. Imagine you

"sampled" w/o replacement 100% of establishments and workers in a

population; with the fpc's, all standard errors would be zero. This

is as it should be; the svy SEs in a regression using the population

are zero, because svy SEs represent deviations around the population

value (not Fisher-Neyman notions of deviations about what might have

been observed in the population with a different random sprinkling of

regressors on individuals).

2. svy + panel = trouble. If you want to run a fixed-effect

regression, consider -areg- which allows pweights that vary over time

and a -cluster- option.

3. I would use the time-specific weights which measure the number of

person-years each observation represents in the population of workers

in the two years. The population is then not people, but people*time.

On 11/8/07, Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> wrote:

-- Maury: I would would only add to Austin's good advice: 1. If you are doing regressions and hypothesis tests, do not use the fpc terms. Imagined you had studied 100% of establishments and workers in a population; with the fpc's, all standard errors would be zero. 2. Stata's panel data and multi-level model -xt- commands will not respond to -svyset-. For panel data analysis, the options accommodating the survey design vary by command. 3. You should probably use the survey weights from year 1; but the study documentation may have other advice. Obviously these weights will not sum to the population size in either year 1 or year 2. If the survey deliberately over-sampled a class of workers which is the subject of your analysis (e.g. you wish to compare a minority to a majority group, and the survey over-sampled the minority group), you should probably ignore the survey weights altogether. -Steven On Nov 8, 2007, at 10:16 AM, Austin Nichols wrote:Maury Gittleman <Gittleman.Maury@bls.gov>: Just clustering on establishment is probably sufficient. You can also specify two levels of clustering with -svyset- e.g. webuse stage5a svyset su1 [pweight=pw], fpc(fpc1) || su2 where su1 is your establishment id, fpc1 the number of distinct employees in both years, and su2 is a person id. Usually the second level of clustering is largely irrelevant. But not always... svyset su1 [pweight=pw], fpc(fpc1) strat(strat) svy: reg yreg x? est sto c1lev svyset su1 [pw=pw], fpc(fpc1) str(str) || su2, fpc(fpc2) svy: reg yreg x? est sto c2lev esttab *, mti On 11/8/07, Gittleman, Maury - BLS <Gittleman.Maury@bls.gov> wrote:Hello,

I'm have a question concerning stata's approach to estimating

standard

errors in the presence of clustered survey data. The survey I'm

using

collects information on individual wages, by first selecting

establishments at random, and then collecting information on multiple

workers within each establishment. So, it is clear that, when I'm

running regressions, I need to cluster on establishment.

My question arises when I use two years of data from the same survey.

For about 4/5 of the individuals, there will be data for two

years, and

I would expect that the correlation between the errors for any given

individual will be higher than the correlation between the errors for

two different individuals at the same establishment. My thinking is

that I still want to define clusters by establishments, as the

variance

estimation is said to be robust to any arbitrary intra-cluster

correlation.

Is this the right way to go or is there an alternative approach that

might be superior?

Thanks very much.

Maury

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/Steven Samuels sjhsamuels@earthlink.net 18 Cantine's Island Saugerties, NY 12477 Phone: 845-246-0774 EFax: 208-498-7441 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Steven Samuels sjhsamuels@earthlink.net 18 Cantine's Island Saugerties, NY 12477 Phone: 845-246-0774 EFax: 208-498-7441 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: How to obtain fit indices from a logistic regression model in Stata?***From:*tiago.pereira@incor.usp.br

**Re: st: How to obtain fit indices from a logistic regression model in Stata?***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**st: Variance estimation with clusters***From:*"Gittleman, Maury - BLS" <Gittleman.Maury@bls.gov>

**Re: st: Variance estimation with clusters***From:*"Austin Nichols" <austinnichols@gmail.com>

**Re: st: Variance estimation with clusters***From:*Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>

**Re: st: Variance estimation with clusters***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**Re: st: List of 3 variables over time?** - Next by Date:
**Re: st: RE: three mean and sd plots on the same graph?** - Previous by thread:
**st: RE: Majordomo results - repeated emails** - Next by thread:
**st: Estimates save and lincom - Stata10** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |