Stata The Stata listserver
Re: st: panel data sets based on complex survey design

From   Jenkins S P <>
Subject   Re: st: panel data sets based on complex survey design
Date   Tue, 25 May 2004 08:57:01 +0100 (BST)

On Tue, 25 May 2004 wrote:

> Users, I posted this query a couple of weeks ago but it didnt seem to spark
> any responses. Im posting it again in the hope someone has some thoughts on
> the matter.

The Statalist FAQ provides advice about why posts may not be answered and
whether to re-post to all, without re-writing or reconsideration.

> I am about to undertake analysis on the Household, Income and Labour
> Dynamics in Australia (HILDA) survey using Stata version 8 (which I have
> recently obtained access to). HILDA is a panel data set of Australian
> households and individuals. I was hoping to find that the latest version of
> Stata had the capability to deal with data that was both longtitudinal in
> nature and of a complex survey design (I guess a combination of the xt and
> svy commands). However, my initial scan of the guides doesn't reveal
> anything that can deal with both these issues simultaneously.
> How do other users analysising HILDA (and other longtitudinal surveys) deal
> with this issue of longtitudinal data when there are significant
> startification and clustering issues? Any thoughts greatly appreciated.

There probably aren't commands routinely available because it is not clear
that one should account for complex survey design in a panel in this
manner. The approach in effect assumes that design affects can be dealt
using appropriate weights (and accounting for the clustering and
stratification).  But where do the weights come from?  Usually they are
all-purpose weights (and may not even be special longitudinal weights),
and derived, broadly speaking, from regressions of the probability of
retention with loads of RHS variables.

Economists and others often approach this differently (call it a
'modelling' approach rather than a 'weighting' approach): they model the
retention probability jointly with the process of interest. The key
difference from the weighting approach is that one allows for correlations
between the unobservable factors determining retention and process of
interest.  (Of course there are arguments about identification to be
resolved as well.) See Journal of Human Resources 1998 special issue on
attrition etc in longitudinal surveys, and references therein.

If you are simply after crosstabs and other descriptives from longitudinal
data with an originally complex design, then it is more common to take a
weighting approach, and these commands are available in Stata of course
(-svytab- etc.).  Again there is the issue of the weights (which ones).
Which vbles get used in the -svyset-ting of the clustering and
stratification when you have multiple waves?  Not totally clear, but a
survey statistician colleague of mine once recommended that you use the
cluster/psu and strata from wave 1 of the survey.

[NB1 issues get much more complicated when one e.g. pools annual
transitions from successive years of a panel survey. It is not clear what
weights one should use in this case -- the longitudinal weight from the
second year in each case?]

[NB2 Working out which type of weights to use is a tricky business.
Different household panels provide different sorts of weights,
cross-sectional and longitudinal, and for enumerated individuals, adult
respondents, and households.  The PSID does not distinguish between
cross-sectional and longitudinal weights, whereas the BHPS and the GSOEP
do -- though the last two provide longitudinal weights in different ways.
I don't know what sort of weights HILDA use.]

Stephen (from the home of the BHPS)
Professor Stephen P. Jenkins <>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester CO4 3SQ, UK
Phone: +44 1206 873374.  Fax: +44 1206 873151.

