Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: stset help, please

From   "Stephen P. Jenkins" <>
Subject   Re: st: stset help, please
Date   Thu, 28 Nov 2002 15:02:58 +0000 (GMT Standard Time)

On Wed, 27 Nov 2002 23:37:46 -0800 (PST) Enrica Croda 
<> wrote:

> Dear Stata-experts:
> I am a newbie at survival analysis, and I would appreciate
> your help with propoerly setting up the dataset for the analysis
> with Stata 7.
> I have annual data for 15 years from a panel household survey
> on living arrangements of elderly households.
> My original data come in the form of an _unbalanced_ dataset,
> where the data are organized by ID and year (iis ID, tis year
> in xt-language).
> The data set covers the period 1984 through 1998 (15 years in
> total, but elderly households are in my sample only if the head
> of the household is older than 65. If they die or drop out of the
> survey, I have no records for them after the death or drop-out.


Ignoring repeated spell issues for the moment, ...

The date at which people first become at risk of not living 
independently need not coincide with the first date at which they were 
observed in your panel. The first date is the one at which the survival 
time clock starts ticking (t=0 in expressions for S(t)), whereas the 
second date is relevant for "delayed entry" adjustments. In essence, 
when modelling the time to transition to dependent living using your 
type of data, you have to condition on 'survival' (remaining 
independent) between  t=0 and the date at which first surveyed 
(assuming that living independently then). 

You might ease the problem by making the simplifying assumption that 
everyone lives independently up to age 65 at least (which might be 
reasonable for the majority of the population), and set t=0 for that 
age.   Things are more complicated if not everyone is living 
independently at the start of the panel (though all the examples you 
showed us had indep==1 in first wave observed) -- the reason being that 
those already living dependently may be a non-random sample (a 
'selection bias' sort of issue). [This may be relevant to your survey 
because, if it is the survey I think it is, then the spouses of 
household heads need not be 65 -- they may be younger or older.]

An additional complication is differential mortality and attrition, 
which are presumably 'competing risks' with the hazard of not living 
independently. Again some initial progress can be made by assuming the 
competing risks are independent, and applying standard techniques. See 
e.g. standard texts or 
[If you want to get into serious modelling of repeated spells &/or 
non-independent competing risks, you'll get into the relatively 
complicated world of "mixture hazard" models fairly quickly.]

An article addressing issues that are related to yours is:
Meghir, C. and Whitehouse, E. (1997), 'Labour market transitions and 
retirement of men in the UK', Journal of Econometrics 79, 327-354.

Finally, do you have the exact dates at which transitions are made, or 
just the survey year?  You are using -stset- and -snapspan-, which in 
effect assume the former. If it is the latter (as appears from your 
output), you have grouped duration data ('interval censoring'), in 
which case discrete time models may be a more appropriate way to 
proceed. [See the URL above]  The issues about defining entry and 
delayed entry times, and competing risks, also concern those 
models too of course.

Professor Stephen P. Jenkins <>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index