Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: grouped duration-discrete time survival analysis-WASstset help,please

From   "Stephen P. Jenkins" <>
Subject   Re: st: grouped duration-discrete time survival analysis-WASstset help,please
Date   Mon, 2 Dec 2002 09:41:15 +0000 (GMT Standard Time)

On Sun, 1 Dec 2002 03:07:48 -0800 (PST) Enrica Croda 
<> wrote:


> So, to recap, I now believe my data are grouped duration data... 
> I understand that in this case I need to organize my data the so-called
> "person-period" form. 
> I would appreciate getting feedback on the following: 
> My data are already organized by ID and year in "long" panel data
> form (iis ID, tis year) with year = 1984, 1985,...1998.
> A. Do I need to -expand- the data set? Am I correct in thinking
> that I do not? I am thinking I just need to generate the analysis time
> variable, with something like: 
> (A1)	by ID: generate TIME = _n;
> please see also question B, below. 
> B. How do I deal with delayed entry?
> Assuming people first become at risk of not living independently at age 65,
> which may not be the age at which they are first observed in my data,
> how do I incorporate this information in my analysis?

Suppose first that there is no delayed entry -- in which case you would 
need a row in the data set corresponding to each year that each person 
was /at risk of experiencing the event of interest/. If you were to 
assume the first year at risk corresponds to age 65, you need rows for 
each person for each year corresponding to age 65+. As the first survey 
year (1984 in GSOEP) is after age 65 for most persons, then you 
would need to create new rows in the data corresponding to those ages 
before the beginning of the survey. The TIME variable starts with 1 for 
age 65, then 2 for age 66, and so on. [You would also need to 'spread' 
values for explanatory variables back onto these new person-year obs.]
-expand- could probably be used to create the required data structure, 
making using of the -if- qualifier to ensure that the correct number of 
new person-year observations gets generated for each person. (As the 
respondents were of different ages in 1984, the number of new data rows 
will differ from person to person.)

Now, to control for the delayed entry aspect and get the likelihood 
correct, all you need do is create the data structure as just stated, 
but throw away the person-years corresponding to pre-1984 (first survey 
year). (Note that the duration counter TIME does not start from 1 in 
most cases in the delayed-entry version of the data set.)  All this is 
discussed in those lecture notes you cited, together with regression 
models that you could apply once the data have been created.

> C. Would the solution to question B be different if I plan to control for
> age in the 'regression' analysis?

Given the way you have defined your time-at-risk variable (in terms of 
age), wouldn't "age" as an explanatory variable be perfectly correlated 
with TIME?   

> D. Do I still need to stset the variables?

No. -st- is designed primarily for continuous time duration models. One 
can use the -st- utilities to reorganise your data and so on, but that 
is a different issue. You don't need -stset- in order to estimate 
discrete time duration models.

Professor Stephen P. Jenkins <>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index