[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Austin Nichols" <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Panel data and sparse data |

Date |
Wed, 16 Jul 2008 12:41:52 -0400 |

James Nachbaur-- Sounds like World Bank data or the data sources on which it is based, e.g. national census/survey data collected at irregular heterogeneous intervals. Rather than interpolating or imputing, you may want to fill forward, so each variable is measured as of the last time observed, but limit this to one observation "filled in" in the final data. E.g. if you have data on India in 1975, 1980, 1985, 1990, and 1995, but you have data on Pakistan in 1977, 1982, 1989, and 1994, maybe you want to use obs defined as of 1977, 1982, 1990, and 1995, and use the most recent year of data for each of those. You certainly don't want to conduct a survival analysis as if you have 21 (or even 19) years of data on each country, which is what interpolation/imputation would imply. The first step in this process, I think, is to determine the number of observations you can plausibly use, given different choices over years to include. What are you planning to do about countries merging/splitting/being born? If you ignore advice not to interpolate, at least do it in logs for vars which are strictly positive (won't matter for all vars, but where it matters, e.g. population or GDP, it is probably superior). E.g. sysuse uslifeexp g y=le if mod(year,5)==0 g lny=ln(y) ipolate lny year, gen(iy) g exp=exp(iy) line le year || sc exp y year Note in the example how poor the interpolated data can be. On 7/16/08, Nick Cox <n.j.cox@durham.ac.uk> wrote: > In this context imputation is usually called interpolation, with a > centuries-long history to boot. And you can do it inn various ways from > linear interpolation (-ipolate-) and cubic interpolation (-cipolate- > from SSC) upwards. > > But my visceral reaction is, for your situation, Don't. Survival > analysis is in a strong sense geared to make use of the information you > have and interpolation would just be a way of kidding yourself you had > more. > > Nick > n.j.cox@durham.ac.uk > > James Nachbaur > > I have a panel data set of 165 counties over 55 years with many > variables observed every 10 years or every 4 to 5 years. I am running > a survival time model with unobserved heterogeneity. My question for > the list is, What is a good way to impute data for the years that lack > observations? In my research, I have seen a lot on variables missing > at random, or on data sets where only one variable has missing data, > but my situation is not like those. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Panel data and sparse data***From:*"James Nachbaur" <nachbaur@gmail.com>

**st: RE: Panel data and sparse data***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: Clarify error message** - Next by Date:
**Re: st: Stata and biology/biomedical sciences** - Previous by thread:
**st: RE: Panel data and sparse data** - Next by thread:
**st: Re: Panel data and sparse data** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |