Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Variance estimation with clusters


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Variance estimation with clusters
Date   Thu, 8 Nov 2007 11:00:26 -0500

Steven makes some good points.  I have a slightly different take:

1.  Use the -fpc- option, but understand what it means.   Imagine you
"sampled" w/o replacement 100% of establishments and workers in a
population; with the fpc's, all standard errors would be zero.  This
is as it should be; the svy SEs in a regression using the population
are zero, because svy SEs represent deviations around the population
value (not Fisher-Neyman notions of deviations about what might have
been observed in the population with a different random sprinkling of
regressors on individuals).

2. svy + panel = trouble.  If you want to run a fixed-effect
regression, consider -areg- which allows pweights that vary over time
and a -cluster- option.

3.   I would use the time-specific weights which measure the number of
person-years each observation represents in the population of workers
in the two years.  The population is then not people, but people*time.

On 11/8/07, Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> wrote:
> --
>
> Maury:
>
> I would would only add to Austin's good advice:
>
> 1. If you are doing regressions and hypothesis tests, do not use the
> fpc terms. Imagined you had studied 100% of establishments and
> workers in a population; with the fpc's, all standard errors would be
> zero.
>
> 2. Stata's panel data and multi-level model -xt- commands will not
> respond to -svyset-.  For panel data analysis, the options
> accommodating the survey design vary by command.
>
> 3. You should probably use the survey weights from year 1; but the
> study documentation may have other advice. Obviously these weights
> will not sum to the population size in either year 1 or year 2. If
> the survey deliberately over-sampled a class of workers which is the
> subject of your analysis (e.g. you wish to compare a minority to a
> majority group, and the survey over-sampled the minority group), you
> should probably ignore the survey weights altogether.
>
> -Steven
>
> On Nov 8, 2007, at 10:16 AM, Austin Nichols wrote:
>
> > Maury Gittleman <Gittleman.Maury@bls.gov>:
> > Just clustering on establishment is probably sufficient.
> >
> > You can also specify two levels of clustering with -svyset- e.g.
> >
> > webuse stage5a
> > svyset su1 [pweight=pw], fpc(fpc1) || su2
> >
> > where su1 is your establishment id, fpc1 the number of distinct
> > employees in both years, and su2 is a person id.
> >
> > Usually the second level of clustering is largely irrelevant.  But
> > not always...
> >
> > svyset su1 [pweight=pw], fpc(fpc1) strat(strat)
> > svy: reg yreg x?
> > est sto c1lev
> > svyset su1 [pw=pw], fpc(fpc1) str(str) || su2, fpc(fpc2)
> > svy: reg yreg x?
> > est sto c2lev
> > esttab *, mti
> >
> >
> > On 11/8/07, Gittleman, Maury - BLS <Gittleman.Maury@bls.gov> wrote:
> >> Hello,
> >>
> >> I'm have a question concerning stata's approach to estimating
> >> standard
> >> errors in the presence of clustered survey data.  The survey I'm
> >> using
> >> collects information on individual wages, by first selecting
> >> establishments at random, and then collecting information on multiple
> >> workers within each establishment.  So, it is clear that, when I'm
> >> running regressions, I need to cluster on establishment.
> >>
> >> My question arises when I use two years of data from the same survey.
> >> For about 4/5 of the individuals, there will be data for two
> >> years, and
> >> I would expect that the correlation between the errors for any given
> >> individual will be higher than the correlation between the errors for
> >> two different individuals at the same establishment.  My thinking is
> >> that I still want to define clusters by establishments, as the
> >> variance
> >> estimation is said to be robust to any arbitrary intra-cluster
> >> correlation.
> >>
> >> Is this the right way to go or is there an alternative approach that
> >> might be superior?
> >>
> >> Thanks very much.
> >>
> >> Maury
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
>
> Steven  Samuels
>
> sjhsamuels@earthlink.net
> 18 Cantine's Island
> Saugerties, NY 12477
> Phone: 845-246-0774
> EFax: 208-498-7441
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index