st: transform panel data to fit duration models

 From "Ellen Van de Poel" To Subject st: transform panel data to fit duration models Date Mon, 6 Aug 2007 14:39:58 +1000

```Hi all,

I am trying to stset my panel data to fit duration analysis but I am not
sure whether what I'm doing is right and how I should deal with delayed
entry. Apologies for the rather long message, but I want to be clear on what
I do. My data comes from a survey that is held with the same individuals at
5 different times and looks like this:

id  date of birth    date of interview    date of death      income
........
1   12 dec 1940       12 jan 1991          1 april 1998        200
1   12 dec 1940       13 feb 1993          1 april 1998        225
1   12 dec 1940       01 ma  1997          1 april 1998        230
2   15 jan 1961       15 jan 1991                  .           350
3   27 feb 1955       15 jan 1991                  .           100
3   27 feb 1955       22 feb 1993                  .           110
3   27 feb 1955       30 jan 1997                  .           130
3   27 feb 1955       05 sep 2000                  .           200
3   27 feb 1955       10 dec 2004                  .           180

So if an individual died during waves, other household members are asked
about the exact date of death of this individual. In the above example, only
the 1st individual died, the 2nd attried and the 3rd is observed in all
waves (but still alive in last wave).
I first generate a time variable that captures the age of each individual
and add an extra line for each person that died with a dummy to indicate the
death, so the data looks like this:

id   age    income   died  ........
1    50.08	200		.
1    52.17	225		.
1    56.25 	230		.
1    57.60	.	      1
2    30.00	350		.
3    35.91	100		.
3    37.95	110		.
3    41.92	130		.
3    45.67	200		.
3    49.92	180		.

And thereafter I use the SNAPSPAN command - snapspan id age died, gen(time0)
replace - to transform data to:

id   time0	   age     income  died  ........
1    	.	   50.08	.	  .
1	50.08	   52.17	200	  .
1     52.17	   56.25	225	  .
1     56.25    57.60	230	  1
2     .	   30.00	.	  .
3	.	   35.91	.	  .
3    35.91     37.95	100	  .
3    37.95     41.92	110	  .
3    41.92     45.67	130     .
3    45.67     49.92	200     .

But this implies that I lose the information on the last observed income for
those that did not die. So unless I want to assume that income is a
'retrospective' variable, I will lose this information and people that are
only observed once go lost in the analysis?
If this is correct, how should I stset my data then? I tried - stset age,
id(id) failure(died)- which seems to work, but this doesn't take into
account that people are at risk of dying as from their birth and not as from
when they enter the survey. I tried using the age at first entry as the
enter() variable, but then all first observations are ignored.
Does anyone have an idea on what I am doing wrong and how I can solve this
problem?