Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: weights panel-survey data


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: weights panel-survey data
Date   Wed, 15 Oct 2008 09:56:24 -0400

You should of course listen to warnings from Stas about potential bias
and inconsistency, and I have yet to meet the random effects model
that survives a specification check via the -xtoverid- Hausman-style
test (ssc desc xtoverid), but in practice, many researchers use the
panel-specific weight in the first wave of a longitudinal survey and
do not adjust for attrition (or replenishment).  Something like:

bys id (wave): replace weight=weight[1]

There are those who ignore weights on the assumption that a correctly
specified model need not use sampling weights, and then there are the
rest of us who never believe we have an exactly correctly specified
model and therefore one must use weights.  This schism appears to be
largely faith-based, in the sense that little actual evidence of
getting large biases not using weights or getting high-variance
estimates using weights is actually adduced.

On Wed, Oct 15, 2008 at 2:08 AM, Stas Kolenikov <skolenik@gmail.com> wrote:
> If you want to use survey weights, then you must have some finite
> population, or some hypothetical super-population, to which those
> weights are generalizing. I can see what pweights for each particular
> year are doing -- they allow you to compute (unbiased) totals for that
> year, thus leading to approximately unbiased averages. I am not sure I
> see an immediate meaning of the population-across-years idea. Are
> those the people who stayed in the population over those years?
> Meaning, they did not move to another territory and did not die. Most
> likely you won't have a reliable sample from that population -- you
> don't really know the probabilities of selection from that population
> unless you have pretty accurate figures about who moved and who died
> and who was born over those years, and all that demographics, and how
> it played out in the particular areas that were sampled.
>
> Now further, all those weights and clusters and strata work really
> well when you need descriptive information such as means and ratios.
> When you want to do some modeling such as regression, you would have
> to deal with at least two sources of variability in your head -- model
> randomness (that's what we are all used to; in regression settings,
> this randoness will manifest itself in residuals) and sampling
> randomness (accounting for the fact that you took but a sample from
> population). The existing methods, by and large, usually pretend the
> first source is not as important as the second one, and concentrate on
> design-consistent inference. Taking both into account is an exciting
> and largely unknown field to play; depending on the nuances of your
> sampling and modeling, you can screw up your analysis either way and
> get inconsistent estimates and/or standard errors quite easily. With
> longitudinal data and random effects, you would have to start thinking
> like, "OK, did I sample people with fixed u_i's, and then had some
> normal random e_ij on top of that [that's the sort of the thinking
> that requires your weights to be constant within panel], or did I
> sample both fixed u_i and e_ij, or what?" (Here, the word "fixed' is
> used not in the sense of fixed vs random effects in panel data models,
> but rather in the sense of finite population sampling that assumes you
> can have perfect measurement of fixed individual characteristics --
> such as person's weight or eye color.)
>
> Some basics are outlined by Binder and Roberts (2003, see
> http://www.citeulike.org/user/ctacmo/article/1036932); this productive
> couple of co-authors also gave a very informative presentation a few
> years back at Joint Statistical Meetings addressing those strange
> populations that appear in longitudinal surveys; as far as I know from
> the authors, they have had difficulties finding an appropriate outlet
> for their work. Literally the only other serious paper about
> longitudinal aspects in surveys that I am aware of is Skinner & Vieira
> (2007, see http://www.citeulike.org/user/ctacmo/article/2862653).
>
> The bottomline is that depending on your field, you might want to
> forget about either the survey aspect or the nice panel data aspect,
> to make at least one part of your analysis fully compatible with at
> least one established paradigm (rather than trying to tweak both of
> them into something with unknown statistical properties). If you come
> from economics side where everybody knows that random effects are
> biased, but fixed effects are inefficient, and you can do Hausman test
> to check whether that's really true -- if that's the background where
> your survey has to fit into, you might want to stress the panel
> aspect, and just make sure that using whatever weights you have does
> not seem to give answers too far off. (Judging by your email, I tend
> to think that is your story.) Unless you know that your referees will
> be after you for not using the weights (and you have checked that the
> weights are not wildly different -- say have a coefficient of
> variation of 0.2 or less, and not wildly correlated with the
> characteristics of interest, especially the dependent variable or
> residuals in your panel regressions), then you could just ignore the
> weights and see what comes out.
>
> With some luck, you should be able to squeeze whatever weights you
> have in each wave into -gllamm- modeling framework; type -findit
> gllamm- to figure it out. Another advantage of -gllamm- is that it
> lets you to use all the data and not worry too much about attrition.
>
> On 10/14/08, mdeidda@stern.nyu.edu <mdeidda@stern.nyu.edu> wrote:
>> Dear all
>>  I am estimating a 3 year panel (random effect) using survey data. To get
>> correct estimates I should use sampling weights, but the command xtreg, re
>> does not allow me to use weights...I can use xtreg with the option mle,
>> which allow me to use analytic weights. But stata requires weights to be
>> constant within panels.
>>  Is it correct to use the weights of the last year, and assume they are
>> constant within panel?
>>  Or should I simply ignore weights?
>>  I really appreciate your help,
>>  Manuela
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index