# Re: st: weights panel-survey data

 From "Stas Kolenikov" To statalist@hsphsun2.harvard.edu Subject Re: st: weights panel-survey data Date Wed, 15 Oct 2008 01:08:51 -0500

```If you want to use survey weights, then you must have some finite
population, or some hypothetical super-population, to which those
weights are generalizing. I can see what pweights for each particular
year are doing -- they allow you to compute (unbiased) totals for that
year, thus leading to approximately unbiased averages. I am not sure I
see an immediate meaning of the population-across-years idea. Are
those the people who stayed in the population over those years?
Meaning, they did not move to another territory and did not die. Most
likely you won't have a reliable sample from that population -- you
don't really know the probabilities of selection from that population
unless you have pretty accurate figures about who moved and who died
and who was born over those years, and all that demographics, and how
it played out in the particular areas that were sampled.

Now further, all those weights and clusters and strata work really
well when you need descriptive information such as means and ratios.
When you want to do some modeling such as regression, you would have
to deal with at least two sources of variability in your head -- model
randomness (that's what we are all used to; in regression settings,
this randoness will manifest itself in residuals) and sampling
randomness (accounting for the fact that you took but a sample from
population). The existing methods, by and large, usually pretend the
first source is not as important as the second one, and concentrate on
design-consistent inference. Taking both into account is an exciting
and largely unknown field to play; depending on the nuances of your
sampling and modeling, you can screw up your analysis either way and
get inconsistent estimates and/or standard errors quite easily. With
longitudinal data and random effects, you would have to start thinking
like, "OK, did I sample people with fixed u_i's, and then had some
normal random e_ij on top of that [that's the sort of the thinking
that requires your weights to be constant within panel], or did I
sample both fixed u_i and e_ij, or what?" (Here, the word "fixed' is
used not in the sense of fixed vs random effects in panel data models,
but rather in the sense of finite population sampling that assumes you
can have perfect measurement of fixed individual characteristics --
such as person's weight or eye color.)

Some basics are outlined by Binder and Roberts (2003, see
http://www.citeulike.org/user/ctacmo/article/1036932); this productive
couple of co-authors also gave a very informative presentation a few
years back at Joint Statistical Meetings addressing those strange
populations that appear in longitudinal surveys; as far as I know from
the authors, they have had difficulties finding an appropriate outlet
for their work. Literally the only other serious paper about
longitudinal aspects in surveys that I am aware of is Skinner & Vieira
(2007, see http://www.citeulike.org/user/ctacmo/article/2862653).

The bottomline is that depending on your field, you might want to
forget about either the survey aspect or the nice panel data aspect,
to make at least one part of your analysis fully compatible with at
least one established paradigm (rather than trying to tweak both of
them into something with unknown statistical properties). If you come
from economics side where everybody knows that random effects are
biased, but fixed effects are inefficient, and you can do Hausman test
to check whether that's really true -- if that's the background where
your survey has to fit into, you might want to stress the panel
aspect, and just make sure that using whatever weights you have does
not seem to give answers too far off. (Judging by your email, I tend
to think that is your story.) Unless you know that your referees will
be after you for not using the weights (and you have checked that the
weights are not wildly different -- say have a coefficient of
variation of 0.2 or less, and not wildly correlated with the
characteristics of interest, especially the dependent variable or
residuals in your panel regressions), then you could just ignore the
weights and see what comes out.

With some luck, you should be able to squeeze whatever weights you
have in each wave into -gllamm- modeling framework; type -findit
gllamm- to figure it out. Another advantage of -gllamm- is that it
lets you to use all the data and not worry too much about attrition.

On 10/14/08, mdeidda@stern.nyu.edu <mdeidda@stern.nyu.edu> wrote:
> Dear all
>  I am estimating a 3 year panel (random effect) using survey data. To get
> correct estimates I should use sampling weights, but the command xtreg, re
> does not allow me to use weights...I can use xtreg with the option mle,
> which allow me to use analytic weights. But stata requires weights to be
> constant within panels.
>  Is it correct to use the weights of the last year, and assume they are
> constant within panel?
>  Or should I simply ignore weights?
>  I really appreciate your help,
>  Manuela
>
>
>  *
>  *   For searches and help try:
>  *   http://www.stata.com/help.cgi?search
>  *   http://www.stata.com/support/statalist/faq
>  *   http://www.ats.ucla.edu/stat/stata/
>

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```