# Re: st: Which weight

 From Steven Joel Hirsch Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: Which weight Date Sun, 14 Oct 2007 14:00:57 -0400

```On Oct 8, 2007, at 11:44 AM, Sergiy Radyakin wrote:
```
```Dear Steven,

there also a straightforward answer in the following situtations?

Situation A: N households were chosen at time t0 and followed as a
panel. M households were added to the panel at t1 (t0<t1). At any
point t I have crossectional weights and a probability to stay in the
sample for each household. At any moment t an event of interest can
happen in the houshold, (e.g. a child leaves the household). In this
case I add the characteristics of  the household at time t [and may be
t-1] to "my" sample). Which weights should I use? I obviously can't
choose weights at t0 because not all households were in the sample at
t0 and I can't use the weights at t1 because some of the households,
for which the event of interest has occured before t1, have dropped
out before t1 due to panel mortality.

Situation B: N households are chosen at time t0 and M households are
chosen at t1. pweights are given. However, during the time between t0
and t1, the population (e.g. the population of a country), from which
the samples were drawn has changed (e.g. doubled in size). I am
working with a pooled sample of households (N+M).  Which weights can I
use?
If I am working with one subpopulation only (e.g. men) and the
proportion of these cases has changed, can I still pool observations?
(E.g. women/men=50/50 in t0 but women/men=60/40 in t1). If yes, what
interpretation do I give to the estimates then? [this is not a panel
case]

Is there any good online guide on longitudinal weights? Preferrably
with plain examples on how to deal with different situations as
outlined above?

```

Sergiy, There is no easy answer to your questions. I do not know a good text for panel study weighting. I learned much of what I know by studying the documentation for some of the large panel studies.

Situation A

Recall that a sample weight is, roughly, the number of population members 'represented' by the observation; the sum of sample weights should equal the total number of population members.

If you need data only at calendar time 't', you would the cross- sectional weight for 't'. The data set authors will have carefully calibrated them. For example, panels may rotate every six months, but the weights may be appropriate to the entire calendar year population. The documentation should make this clear.

However, you want to use information from period 't-1' to predict outcomes in 't'. So, you require observations with data both at 't-1' and at 't'; these are a subset of those with data at ‘t’ or 't-1'. Neither the 't' nor the 't-1' weights will add to the population totals at those periods; so neither is a proper weight. Yet if the sampling protocol did not drastically change over your period of interest, then these weights should be approximately proportional to the proper weights. I recommend that you use the 't' weights.

Situation B

An answer will depend on the purpose of your analysis. Is it 'descriptive', meaning that you are interested only in descriptive statistics or 'analytic', meaning that the focus is on models and hypothesis tests. The fact that you are interested in pooling suggests the purpose is analytic.

Note that time period will be a part of the stratum identification.

Purpose Descriptive:

Use the original weights supplied with the data. The pooled sample represents the experience of the population during the two survey periods. Sometimes this is a legitimate target for descriptive statistics: if the study was done in adjacent years, then the pooled sample represents the population experience over the two-year period. Any descriptive statistic, such as the ratio of men and women, will be an average of the ratios from the two periods. If the two periods are adjacent, then these kinds of averages might be valuable.

The interpretation is the same if the weights have been post- stratified or raked.

Purpose Analytic

If you are interested in modeling outcomes as functions of predictors, then pooling is a way of increasing sample size. When you present descriptive statistics, I suggest that you present unweighted statistics. Your readers will want to see the actual numbers, not the weighted population numbers.

In the analysis, most investigators would use the original supplied weights. Suppose regression coefficients changed between periods (factor x period interaction), then the estimated coefficients ignoring the interaction will be a weighted average of the period- specific coefficients.

Most investigators would weight the analysis of the pooled data. I might not. Suppose that the population doubled in size between the two time periods, but that the sample sizes were similar. In the weighted data, observations in the second period will have twice the weight of observations in the first period. I would consider this undesirable and would

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/