[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: weights panel-survey data

From	[email protected]
To	[email protected]
Subject	Re: st: weights panel-survey data
Date	Wed, 15 Oct 2008 12:15:39 -0400

Dear Austin and Stas,
thank you very much for your answers!
Manuela

Austin Nichols wrote:

You should of course listen to warnings from Stas about potential bias
and inconsistency, and I have yet to meet the random effects model
that survives a specification check via the -xtoverid- Hausman-style
test (ssc desc xtoverid), but in practice, many researchers use the
panel-specific weight in the first wave of a longitudinal survey and
do not adjust for attrition (or replenishment).  Something like:

bys id (wave): replace weight=weight[1]

There are those who ignore weights on the assumption that a correctly
specified model need not use sampling weights, and then there are the
rest of us who never believe we have an exactly correctly specified
model and therefore one must use weights.  This schism appears to be
largely faith-based, in the sense that little actual evidence of
getting large biases not using weights or getting high-variance
estimates using weights is actually adduced.

On Wed, Oct 15, 2008 at 2:08 AM, Stas Kolenikov <[email protected]> wrote:

If you want to use survey weights, then you must have some finite
population, or some hypothetical super-population, to which those
weights are generalizing. I can see what pweights for each particular
year are doing -- they allow you to compute (unbiased) totals for that
year, thus leading to approximately unbiased averages. I am not sure I
see an immediate meaning of the population-across-years idea. Are
those the people who stayed in the population over those years?
Meaning, they did not move to another territory and did not die. Most
likely you won't have a reliable sample from that population -- you
don't really know the probabilities of selection from that population
unless you have pretty accurate figures about who moved and who died
and who was born over those years, and all that demographics, and how
it played out in the particular areas that were sampled.

Now further, all those weights and clusters and strata work really
well when you need descriptive information such as means and ratios.
When you want to do some modeling such as regression, you would have
to deal with at least two sources of variability in your head -- model
randomness (that's what we are all used to; in regression settings,
this randoness will manifest itself in residuals) and sampling
randomness (accounting for the fact that you took but a sample from
population). The existing methods, by and large, usually pretend the
first source is not as important as the second one, and concentrate on
design-consistent inference. Taking both into account is an exciting
and largely unknown field to play; depending on the nuances of your
sampling and modeling, you can screw up your analysis either way and
get inconsistent estimates and/or standard errors quite easily. With
longitudinal data and random effects, you would have to start thinking
like, "OK, did I sample people with fixed u_i's, and then had some
normal random e_ij on top of that [that's the sort of the thinking
that requires your weights to be constant within panel], or did I
sample both fixed u_i and e_ij, or what?" (Here, the word "fixed' is
used not in the sense of fixed vs random effects in panel data models,
but rather in the sense of finite population sampling that assumes you
can have perfect measurement of fixed individual characteristics --
such as person's weight or eye color.)

Some basics are outlined by Binder and Roberts (2003, see
http://www.citeulike.org/user/ctacmo/article/1036932); this productive
couple of co-authors also gave a very informative presentation a few
years back at Joint Statistical Meetings addressing those strange
populations that appear in longitudinal surveys; as far as I know from
the authors, they have had difficulties finding an appropriate outlet
for their work. Literally the only other serious paper about
longitudinal aspects in surveys that I am aware of is Skinner & Vieira
(2007, see http://www.citeulike.org/user/ctacmo/article/2862653).

The bottomline is that depending on your field, you might want to
forget about either the survey aspect or the nice panel data aspect,
to make at least one part of your analysis fully compatible with at
least one established paradigm (rather than trying to tweak both of
them into something with unknown statistical properties). If you come
from economics side where everybody knows that random effects are
biased, but fixed effects are inefficient, and you can do Hausman test
to check whether that's really true -- if that's the background where
your survey has to fit into, you might want to stress the panel
aspect, and just make sure that using whatever weights you have does
not seem to give answers too far off. (Judging by your email, I tend
to think that is your story.) Unless you know that your referees will
be after you for not using the weights (and you have checked that the
weights are not wildly different -- say have a coefficient of
variation of 0.2 or less, and not wildly correlated with the
characteristics of interest, especially the dependent variable or
residuals in your panel regressions), then you could just ignore the
weights and see what comes out.

With some luck, you should be able to squeeze whatever weights you
have in each wave into -gllamm- modeling framework; type -findit
gllamm- to figure it out. Another advantage of -gllamm- is that it
lets you to use all the data and not worry too much about attrition.

On 10/14/08, [email protected] <[email protected]> wrote:

Dear all
I am estimating a 3 year panel (random effect) using survey data. To get
correct estimates I should use sampling weights, but the command xtreg, re
does not allow me to use weights...I can use xtreg with the option mle,
which allow me to use analytic weights. But stata requires weights to be
constant within panels.
Is it correct to use the weights of the last year, and assume they are
constant within panel?
Or should I simply ignore weights?
I really appreciate your help,
Manuela

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: weights panel-survey data
  - From: [email protected]
- Re: st: weights panel-survey data
  - From: "Stas Kolenikov" <[email protected]>
- Re: st: weights panel-survey data
  - From: "Austin Nichols" <[email protected]>

Prev by Date: st: hw advice wanted
Next by Date: Re: st: hw advice wanted
Previous by thread: Re: st: weights panel-survey data
Next by thread: st: Calling local variable in twithin()
Index(es):
- Date
- Thread