[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: grouped duration-discrete time survival analysis-WAS stset...
On Mon, 2 Dec 2002 04:24:09 -0800 (PST) Enrica Croda
> On Mon, 2 Dec 2002, Stephen P. Jenkins wrote:
> > On Sun, 1 Dec 2002 03:07:48 -0800 (PST) Enrica Croda
> > <firstname.lastname@example.org> wrote:
> > <snip>
> > > So, to recap, I now believe my data are grouped duration data...
> > > I understand that in this case I need to organize my data the so-called
> > > "person-period" form.
> > > I would appreciate getting feedback on the following:
> > > My data are already organized by ID and year in "long" panel data
> > > form (iis ID, tis year) with year = 1984, 1985,...1998.
> > > A. Do I need to -expand- the data set?
> > > I am thinking I just need to generate the analysis time
> > > variable, with something like:
> > > (A1) by ID: generate TIME = _n;
> > > please see also question B, below.
> > > B. How do I deal with delayed entry?
> > > Assuming people first become at risk of not living independently at age 65,
> > > which may not be the age at which they are first observed in my data,
> > > how do I incorporate this information in my analysis?
> > Suppose first that there is no delayed entry -- in which case you would
> > need a row in the data set corresponding to each year that each person
> > was /at risk of experiencing the event of interest/. If you were to
> > assume the first year at risk corresponds to age 65, you need rows for
> > each person for each year corresponding to age 65+. As the first survey
> > year (1984 in GSOEP) is after age 65 for most persons, then you
> > would need to create new rows in the data corresponding to those ages
> > before the beginning of the survey. The TIME variable starts with 1 for
> > age 65, then 2 for age 66, and so on. [You would also need to 'spread'
> > values for explanatory variables back onto these new person-year obs.]
> > -expand- could probably be used to create the required data structure,
> > making using of the -if- qualifier to ensure that the correct number of
> > new person-year observations gets generated for each person. (As the
> > respondents were of different ages in 1984, the number of new data rows
> > will differ from person to person.)
> Ideally, I would like to use some time-varying variables (e.g. income)
> in the analysis. What would be the appropriate thing to do for these
> variables when I 'spread' them?
You would have to create the appropriate values. Of course the fact
that those new person-year observations are before the start of the
panel may constrain what you are able to create. But in fact if you
make the delayed-entry 'correction' as discussed then the TVCs for
pre-panel years are not needed.
> > Now, to control for the delayed entry aspect and get the likelihood
> > correct, all you need do is create the data structure as just stated,
> > but throw away the person-years corresponding to pre-1984 (first survey
> > year). (Note that the duration counter TIME does not start from 1 in
> > most cases in the delayed-entry version of the data set.)
> I am afraid I am still missing something. Please forgive me if this is a
> silly question. If I understand correctly, the only variable I really
> need is the appropriate 'analysis time' counter. I will throw away all the
> records generated through -expand-. Correct?
I was attempting to discuss general principles rather than special
cases, hoping to help understanding. It appears (from a brief glance)
that, given that you already have person-year data for the period
covered by the panel, you will not have to -expand-, and your code
achieves what is required to generate the correct duration counter.
Professor Stephen P. Jenkins <email@example.com>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.
* For searches and help try: