[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Which weight |

Date |
Sun, 14 Oct 2007 14:00:57 -0400 |

On Oct 8, 2007, at 11:44 AM, Sergiy Radyakin wrote:

Dear Steven, your answer to Nikolaos' question sounds perfectly reasonable. But is there also a straightforward answer in the following situtations? Situation A: N households were chosen at time t0 and followed as a panel. M households were added to the panel at t1 (t0<t1). At any point t I have crossectional weights and a probability to stay in the sample for each household. At any moment t an event of interest can happen in the houshold, (e.g. a child leaves the household). In this case I add the characteristics of the household at time t [and may be t-1] to "my" sample). Which weights should I use? I obviously can't choose weights at t0 because not all households were in the sample at t0 and I can't use the weights at t1 because some of the households, for which the event of interest has occured before t1, have dropped out before t1 due to panel mortality. Situation B: N households are chosen at time t0 and M households are chosen at t1. pweights are given. However, during the time between t0 and t1, the population (e.g. the population of a country), from which the samples were drawn has changed (e.g. doubled in size). I am working with a pooled sample of households (N+M). Which weights can I use? If I am working with one subpopulation only (e.g. men) and the proportion of these cases has changed, can I still pool observations? (E.g. women/men=50/50 in t0 but women/men=60/40 in t1). If yes, what interpretation do I give to the estimates then? [this is not a panel case] Is there any good online guide on longitudinal weights? Preferrably with plain examples on how to deal with different situations as outlined above?

Sergiy, There is no easy answer to your questions. I do not know a good text for panel study weighting. I learned much of what I know by studying the documentation for some of the large panel studies.

Situation A

Recall that a sample weight is, roughly, the number of population members 'represented' by the observation; the sum of sample weights should equal the total number of population members.

If you need data only at calendar time 't', you would the cross- sectional weight for 't'. The data set authors will have carefully calibrated them. For example, panels may rotate every six months, but the weights may be appropriate to the entire calendar year population. The documentation should make this clear.

However, you want to use information from period 't-1' to predict outcomes in 't'. So, you require observations with data both at 't-1' and at 't'; these are a subset of those with data at ‘t’ or 't-1'. Neither the 't' nor the 't-1' weights will add to the population totals at those periods; so neither is a proper weight. Yet if the sampling protocol did not drastically change over your period of interest, then these weights should be approximately proportional to the proper weights. I recommend that you use the 't' weights.

Situation B

An answer will depend on the purpose of your analysis. Is it 'descriptive', meaning that you are interested only in descriptive statistics or 'analytic', meaning that the focus is on models and hypothesis tests. The fact that you are interested in pooling suggests the purpose is analytic.

Note that time period will be a part of the stratum identification.

Purpose Descriptive:

Use the original weights supplied with the data. The pooled sample represents the experience of the population during the two survey periods. Sometimes this is a legitimate target for descriptive statistics: if the study was done in adjacent years, then the pooled sample represents the population experience over the two-year period. Any descriptive statistic, such as the ratio of men and women, will be an average of the ratios from the two periods. If the two periods are adjacent, then these kinds of averages might be valuable.

The interpretation is the same if the weights have been post- stratified or raked.

Purpose Analytic

If you are interested in modeling outcomes as functions of predictors, then pooling is a way of increasing sample size. When you present descriptive statistics, I suggest that you present unweighted statistics. Your readers will want to see the actual numbers, not the weighted population numbers.

In the analysis, most investigators would use the original supplied weights. Suppose regression coefficients changed between periods (factor x period interaction), then the estimated coefficients ignoring the interaction will be a weighted average of the period- specific coefficients.

Most investigators would weight the analysis of the pooled data. I might not. Suppose that the population doubled in size between the two time periods, but that the sample sizes were similar. In the weighted data, observations in the second period will have twice the weight of observations in the first period. I would consider this undesirable and would

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**References**:**st: Which weight***From:*"Nikolaos Kanellopoulos" <nkkanel@yahoo.gr>

**Re: st: Which weight***From:*Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>

**Re: st: Which weight***From:*"Sergiy Radyakin" <serjradyakin@gmail.com>

- Prev by Date:
**st: How to sample uniformly over a set of specific observations?** - Next by Date:
**Re: st: How to sample uniformly over a set of specific observations?** - Previous by thread:
**Re: st: Which weight** - Next by thread:
**st: Re: Scott Merryman's contact details** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |