# Re: st: stset help, please

 From "Stephen P. Jenkins" <[email protected]> To [email protected] Subject Re: st: stset help, please Date Thu, 28 Nov 2002 15:02:58 +0000 (GMT Standard Time)

```On Wed, 27 Nov 2002 23:37:46 -0800 (PST) Enrica Croda
<[email protected]> wrote:

> Dear Stata-experts:
>
> I am a newbie at survival analysis, and I would appreciate
> your help with propoerly setting up the dataset for the analysis
> with Stata 7.
>
> I have annual data for 15 years from a panel household survey
> on living arrangements of elderly households.
>
> My original data come in the form of an _unbalanced_ dataset,
> where the data are organized by ID and year (iis ID, tis year
> in xt-language).
>
> The data set covers the period 1984 through 1998 (15 years in
> total, but elderly households are in my sample only if the head
> of the household is older than 65. If they die or drop out of the
> survey, I have no records for them after the death or drop-out.

<snip>

Ignoring repeated spell issues for the moment, ...

The date at which people first become at risk of not living
independently need not coincide with the first date at which they were
observed in your panel. The first date is the one at which the survival
time clock starts ticking (t=0 in expressions for S(t)), whereas the
second date is relevant for "delayed entry" adjustments. In essence,
when modelling the time to transition to dependent living using your
type of data, you have to condition on 'survival' (remaining
independent) between  t=0 and the date at which first surveyed
(assuming that living independently then).

You might ease the problem by making the simplifying assumption that
everyone lives independently up to age 65 at least (which might be
reasonable for the majority of the population), and set t=0 for that
age.   Things are more complicated if not everyone is living
independently at the start of the panel (though all the examples you
showed us had indep==1 in first wave observed) -- the reason being that
those already living dependently may be a non-random sample (a
'selection bias' sort of issue). [This may be relevant to your survey
because, if it is the survey I think it is, then the spouses of
household heads need not be 65 -- they may be younger or older.]

An additional complication is differential mortality and attrition,
which are presumably 'competing risks' with the hazard of not living
independently. Again some initial progress can be made by assuming the
competing risks are independent, and applying standard techniques. See
e.g. standard texts or
http://www.iser.essex.ac.uk/teaching/stephenj/ec968/index.php
[If you want to get into serious modelling of repeated spells &/or
non-independent competing risks, you'll get into the relatively
complicated world of "mixture hazard" models fairly quickly.]

An article addressing issues that are related to yours is:
Meghir, C. and Whitehouse, E. (1997), 'Labour market transitions and
retirement of men in the UK', Journal of Econometrics 79, 327-354.

Finally, do you have the exact dates at which transitions are made, or
just the survey year?  You are using -stset- and -snapspan-, which in
effect assume the former. If it is the latter (as appears from your
output), you have grouped duration data ('interval censoring'), in
which case discrete time models may be a more appropriate way to
proceed. [See the URL above]  The issues about defining entry and
delayed entry times, and competing risks, also concern those
models too of course.

Stephen
----------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.
http://www.iser.essex.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```