From
"Carlo Lazzaro" <carlo.lazzaro@tin.it>

To
<statalist@hsphsun2.harvard.edu>

Subject
st: R: Correct formatting of survival data

Date
Mon, 4 Feb 2008 13:15:36 +0100

Dear Matthias, basic though I am in dealing with survival analysis, I would try to give a temptative answer to your question, provided I have understood it well. The first advice would be to apply Kaplan-Meier survival function to your dataset, as follows: ---------------------------begin example----------------------------------- set obs 6 g id=_n g In=1977 in 1 replace In=1999 in 2 replace In=1980 in 3 replace In=1979 in 4 replace In=1987 in 5 replace In=1982 in 6 g Out=1981 in 1 replace Out=2002 in 2 replace Out=1981 in 3 replace Out=1990 in 4 replace Out=1995 in 5 replace Out=1985 in 6 g faillure =0 in 2 replace faillure =1 if faillure==. g risk_time=Out-In stset risk_time, id(id) failure(faillure==1) sts list sts graph -----------------------end example---------------------------------- As far as the second advice is concerned: for more details on this topic, I would refer you to the following references: http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/index.php. Cleves M, Gould W and Gutierrez R. An Introduction to Survival Analysis Using Stata, 2nd rev ed. College Station, TX: Stata Press. HTH and Kind Regards, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Flückiger Matthias Inviato: lunedì 4 febbraio 2008 9.37 A: statalist@hsphsun2.harvard.edu Oggetto: st: Correct formatting of survival data Dear Statalisters I am currently trying to analyse a data set on firm survival. I have read up on various sources how to transform the data into the appropriate survival analysis format. Unfortunately I don't know anybody familiar with the topic of survival analysis, so I don't know if what I've done so far is really correct. If expirienced survival data analysts could have a glance at my approach and comment that would be great. Here is a scetch of what my dataset looks like: id year X failure establishment 1 1981 X11 1 1977 2 2000 X21 0 1999 2 2001 X22 0 1999 2 2002 X23 0 1999 3 1981 X31 1 1980 4 1980 X41 0 1979 4 1981 X42 0 1979 4 1989 X43 0 1979 4 1990 X44 1 1979 5 1992 X45 0 1987 5 1995 X51 1 1987 6 1983 X61 0 1982 6 1984 X62 0 1982 6 1985 X63 1 1982 So there is left truncation, right censoring and possibly gaps within an id. Continous time analysis: The commands I used to -snapspan- and -stset- the data set are: g begin=year-1 snapspan id year failure, g(begin_span) replace stset year, id(id) time0(begin) origin(time establishment) f(failure) Am I making any (obvious) mistakes here? In particular, I am not absolutely sure if my 'time0()' definition is ok. I've tried to define a variable within the 'snapspanning process'(i.e. begin_span) but Stata does not recognise the gaps in that case. Discrete time analysis: My main question here is whether I can include the firms with gaps into a cloglog analysis or not (given I brought the data into an appropriate format for analysing a cloglog model). Thanks for any tips or comments Mat * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

