[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Enrica Croda <croda@nicco.sscnet.ucla.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: grouped duration-discrete time survival analysis-WAS stset... |

Date |
Mon, 2 Dec 2002 04:24:09 -0800 (PST) |

On Mon, 2 Dec 2002, Stephen P. Jenkins wrote: > On Sun, 1 Dec 2002 03:07:48 -0800 (PST) Enrica Croda > <croda@nicco.sscnet.ucla.edu> wrote: > > <snip> > > > So, to recap, I now believe my data are grouped duration data... > > I understand that in this case I need to organize my data the so-called > > "person-period" form. > > I would appreciate getting feedback on the following: > > My data are already organized by ID and year in "long" panel data > > form (iis ID, tis year) with year = 1984, 1985,...1998. > > A. Do I need to -expand- the data set? > > I am thinking I just need to generate the analysis time > > variable, with something like: > > (A1) by ID: generate TIME = _n; > > please see also question B, below. > > B. How do I deal with delayed entry? > > Assuming people first become at risk of not living independently at age 65, > > which may not be the age at which they are first observed in my data, > > how do I incorporate this information in my analysis? > > Suppose first that there is no delayed entry -- in which case you would > need a row in the data set corresponding to each year that each person > was /at risk of experiencing the event of interest/. If you were to > assume the first year at risk corresponds to age 65, you need rows for > each person for each year corresponding to age 65+. As the first survey > year (1984 in GSOEP) is after age 65 for most persons, then you > would need to create new rows in the data corresponding to those ages > before the beginning of the survey. The TIME variable starts with 1 for > age 65, then 2 for age 66, and so on. [You would also need to 'spread' > values for explanatory variables back onto these new person-year obs.] > -expand- could probably be used to create the required data structure, > making using of the -if- qualifier to ensure that the correct number of > new person-year observations gets generated for each person. (As the > respondents were of different ages in 1984, the number of new data rows > will differ from person to person.) > Ideally, I would like to use some time-varying variables (e.g. income) in the analysis. What would be the appropriate thing to do for these variables when I 'spread' them? > Now, to control for the delayed entry aspect and get the likelihood > correct, all you need do is create the data structure as just stated, > but throw away the person-years corresponding to pre-1984 (first survey > year). (Note that the duration counter TIME does not start from 1 in > most cases in the delayed-entry version of the data set.) I am afraid I am still missing something. Please forgive me if this is a silly question. If I understand correctly, the only variable I really need is the appropriate 'analysis time' counter. I will throw away all the records generated through -expand-. Correct? If this is correct, could I accomplish the same goal by not expanding at all, and using NEWTIME rather than TIME as 'analysis time', where NEWTIME is generated as follow: by ID: generate newtime= _n + (age[1] - 66); label variable newtime "analysis time"; by ID: generate agediff= (age[1] - 65) if year==84; label variable agediff "age-65 in 1984"; by ID: generate ageflag= agediff[1] if (agediff[1]~=.); label variable ageflag "auxiliary var"; by ID: replace newtime=_n if ageflag==.; Here is a listing of what I get with this code: ID year age newtime 201 91 65 1 201 92 66 2 201 93 67 3 201 94 68 4 201 95 69 5 201 96 70 6 201 97 71 7 201 98 72 8 1101 84 78 13 1101 85 79 14 1101 86 80 15 1101 87 81 16 1101 88 82 17 1101 89 83 18 1101 90 84 19 1101 91 85 20 1101 92 86 21 1101 93 87 22 1101 94 88 23 1101 95 89 24 1101 96 90 25 1101 97 91 26 1101 98 92 27 20302 87 65 1 20302 88 66 2 20302 89 67 3 20302 90 68 4 20302 91 69 5 20302 94 72 6 20302 95 73 7 20302 96 74 8 20302 97 75 9 20302 98 76 10 > All this is > discussed in those lecture notes you cited, together with regression > models that you could apply once the data have been created. > Thanks! Your lecture notes are indeed extremely helpful (I also got your 1995 article in the Oxford Bulletin of Economics and Statistics), and I think I understand what to do for the estimation part of the project. It is the preparation of the data set for the analysis that I still find complicated. (It is the first time I do duration analysis). > > C. Would the solution to question B be different if I plan to control for > > age in the 'regression' analysis? > > Given the way you have defined your time-at-risk variable (in terms of > age), wouldn't "age" as an explanatory variable be perfectly correlated > with TIME? > Yes, it would! Thanks for pointing it out! <snip> Thank you very much for all your help! Enrica * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: grouped duration-discrete time survival analysis-WAS stset...***From:*"Stephen P. Jenkins" <stephenj@essex.ac.uk>

**References**:**Re: st: grouped duration-discrete time survival analysis-WASstset help,please***From:*"Stephen P. Jenkins" <stephenj@essex.ac.uk>

- Prev by Date:
**Re: st: grouped duration-discrete time survival analysis-WASstset help,please** - Next by Date:
**Re: st: grouped duration-discrete time survival analysis-WAS stset...** - Previous by thread:
**Re: st: grouped duration-discrete time survival analysis-WASstset help,please** - Next by thread:
**Re: st: grouped duration-discrete time survival analysis-WAS stset...** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |