[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: R: Correct formatting of survival data

From   "Carlo Lazzaro" <>
To   <>
Subject   st: R: Correct formatting of survival data
Date   Mon, 4 Feb 2008 13:15:36 +0100

Dear Matthias,

basic though I am in dealing with survival analysis, I would try to give a
temptative answer to your question, provided I have understood it well.
The first advice would be to apply Kaplan-Meier survival function to your
dataset, as follows:
---------------------------begin example-----------------------------------
set obs 6
g id=_n
g In=1977 in 1
replace In=1999 in 2
replace In=1980 in 3
replace In=1979 in 4
replace In=1987 in 5
replace In=1982 in 6
g Out=1981 in 1
replace Out=2002 in 2
replace Out=1981 in 3
replace Out=1990 in 4
replace Out=1995 in 5
replace Out=1985 in 6
g faillure =0 in 2
replace faillure =1 if faillure==.
g risk_time=Out-In
stset risk_time, id(id) failure(faillure==1)
sts list
sts graph
-----------------------end example----------------------------------

As far as the second advice is concerned: for more details on this topic, I
would refer you to the following references:

Cleves M, Gould W and Gutierrez R. An Introduction to Survival Analysis
Using Stata, 2nd rev ed. College Station, TX: Stata Press.

HTH and Kind Regards,


-----Messaggio originale-----
[] Per conto di Flückiger
Inviato: lunedì 4 febbraio 2008 9.37
Oggetto: st: Correct formatting of survival data 

Dear Statalisters 

I am currently trying to analyse a data set on firm survival. 
I have read up on various sources how to transform the data into the
appropriate survival analysis format.
Unfortunately I don't know anybody familiar with the topic of survival
analysis, so I don't know if what I've done so far is really correct.
If expirienced survival data analysts could have a glance at my approach and
comment that would be great.

Here is a scetch of what my dataset looks like:

id   year   X     failure  establishment

1    1981   X11      1     	1977
2    2000   X21      0     	1999
2    2001   X22      0     	1999
2    2002   X23	   0     	1999
3    1981   X31      1   	1980
4    1980   X41      0     	1979
4    1981   X42      0     	1979
4    1989   X43      0     	1979
4    1990   X44      1     	1979
5    1992   X45	   0     	1987
5    1995   X51      1     	1987
6    1983   X61	   0     	1982
6    1984   X62	   0     	1982
6    1985   X63	   1     	1982

So there is left truncation, right censoring and possibly gaps within an id.

Continous time analysis:

The commands I used to -snapspan- and -stset- the data set are:

g begin=year-1
snapspan id year failure, g(begin_span) replace
stset year, id(id) time0(begin) origin(time establishment) f(failure)

Am I making any (obvious) mistakes here?
In particular, I am not absolutely sure if my 'time0()' definition is ok.
I've tried to define a variable within the 'snapspanning process'(i.e.
begin_span) but Stata does not recognise the gaps in that case. 

Discrete time analysis:

My main question here is whether I can include the firms with gaps into a
cloglog analysis or not (given I brought the data into an appropriate format
for analysing a cloglog model).

Thanks for any tips or comments


*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index