[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
mdeidda@stern.nyu.edu |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: weights panel-survey data |

Date |
Wed, 15 Oct 2008 12:15:39 -0400 |

Dear Austin and Stas, thank you very much for your answers! Manuela Austin Nichols wrote:

You should of course listen to warnings from Stas about potential bias and inconsistency, and I have yet to meet the random effects model that survives a specification check via the -xtoverid- Hausman-style test (ssc desc xtoverid), but in practice, many researchers use the panel-specific weight in the first wave of a longitudinal survey and do not adjust for attrition (or replenishment). Something like: bys id (wave): replace weight=weight[1] There are those who ignore weights on the assumption that a correctly specified model need not use sampling weights, and then there are the rest of us who never believe we have an exactly correctly specified model and therefore one must use weights. This schism appears to be largely faith-based, in the sense that little actual evidence of getting large biases not using weights or getting high-variance estimates using weights is actually adduced. On Wed, Oct 15, 2008 at 2:08 AM, Stas Kolenikov <skolenik@gmail.com> wrote:If you want to use survey weights, then you must have some finite population, or some hypothetical super-population, to which those weights are generalizing. I can see what pweights for each particular year are doing -- they allow you to compute (unbiased) totals for that year, thus leading to approximately unbiased averages. I am not sure I see an immediate meaning of the population-across-years idea. Are those the people who stayed in the population over those years? Meaning, they did not move to another territory and did not die. Most likely you won't have a reliable sample from that population -- you don't really know the probabilities of selection from that population unless you have pretty accurate figures about who moved and who died and who was born over those years, and all that demographics, and how it played out in the particular areas that were sampled. Now further, all those weights and clusters and strata work really well when you need descriptive information such as means and ratios. When you want to do some modeling such as regression, you would have to deal with at least two sources of variability in your head -- model randomness (that's what we are all used to; in regression settings, this randoness will manifest itself in residuals) and sampling randomness (accounting for the fact that you took but a sample from population). The existing methods, by and large, usually pretend the first source is not as important as the second one, and concentrate on design-consistent inference. Taking both into account is an exciting and largely unknown field to play; depending on the nuances of your sampling and modeling, you can screw up your analysis either way and get inconsistent estimates and/or standard errors quite easily. With longitudinal data and random effects, you would have to start thinking like, "OK, did I sample people with fixed u_i's, and then had some normal random e_ij on top of that [that's the sort of the thinking that requires your weights to be constant within panel], or did I sample both fixed u_i and e_ij, or what?" (Here, the word "fixed' is used not in the sense of fixed vs random effects in panel data models, but rather in the sense of finite population sampling that assumes you can have perfect measurement of fixed individual characteristics -- such as person's weight or eye color.) Some basics are outlined by Binder and Roberts (2003, see http://www.citeulike.org/user/ctacmo/article/1036932); this productive couple of co-authors also gave a very informative presentation a few years back at Joint Statistical Meetings addressing those strange populations that appear in longitudinal surveys; as far as I know from the authors, they have had difficulties finding an appropriate outlet for their work. Literally the only other serious paper about longitudinal aspects in surveys that I am aware of is Skinner & Vieira (2007, see http://www.citeulike.org/user/ctacmo/article/2862653). The bottomline is that depending on your field, you might want to forget about either the survey aspect or the nice panel data aspect, to make at least one part of your analysis fully compatible with at least one established paradigm (rather than trying to tweak both of them into something with unknown statistical properties). If you come from economics side where everybody knows that random effects are biased, but fixed effects are inefficient, and you can do Hausman test to check whether that's really true -- if that's the background where your survey has to fit into, you might want to stress the panel aspect, and just make sure that using whatever weights you have does not seem to give answers too far off. (Judging by your email, I tend to think that is your story.) Unless you know that your referees will be after you for not using the weights (and you have checked that the weights are not wildly different -- say have a coefficient of variation of 0.2 or less, and not wildly correlated with the characteristics of interest, especially the dependent variable or residuals in your panel regressions), then you could just ignore the weights and see what comes out. With some luck, you should be able to squeeze whatever weights you have in each wave into -gllamm- modeling framework; type -findit gllamm- to figure it out. Another advantage of -gllamm- is that it lets you to use all the data and not worry too much about attrition. On 10/14/08, mdeidda@stern.nyu.edu <mdeidda@stern.nyu.edu> wrote:Dear all I am estimating a 3 year panel (random effect) using survey data. To get correct estimates I should use sampling weights, but the command xtreg, re does not allow me to use weights...I can use xtreg with the option mle, which allow me to use analytic weights. But stata requires weights to be constant within panels. Is it correct to use the weights of the last year, and assume they are constant within panel? Or should I simply ignore weights? I really appreciate your help, Manuela* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: weights panel-survey data***From:*mdeidda@stern.nyu.edu

**Re: st: weights panel-survey data***From:*"Stas Kolenikov" <skolenik@gmail.com>

**Re: st: weights panel-survey data***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**st: hw advice wanted** - Next by Date:
**Re: st: hw advice wanted** - Previous by thread:
**Re: st: weights panel-survey data** - Next by thread:
**st: Calling local variable in twithin()** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |