[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Stephen P. Jenkins" <stephenj@essex.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: left censoring, survival analysis |

Date |
Thu, 3 Sep 2009 09:52:47 +0100 |

------------------------------ Date: Wed, 2 Sep 2009 11:20:59 +0100 From: "V. Martini" <vinmar0@hotmail.com> Subject: st: left censoring, survival analysis Dear Stata Users, I'm actually working on a sample of employed workers and I would like to know how to deal with left censoring in my data. My sample contains employed workers in 1998, therefore I observe one spell per worker. The end of the spells is found by using following waves. The beginning of each employment spell can be found because workers have been asked when they started the current job. But unfortunately, this information is not known for workers that started their spell before 1979. My interpretation is that these workers should be treated as left censored, as the date when their spell begins is not known. I only know that their spell started before 1979. My questions are: 1) Can I deal with these data in STATA or should I remove the left censored observations? In my sample, it seems that having left censoring and duration of the spell are positively correlated, therefore deleting these observations is likely to have consequences on inference. 2) Is the use of the module INTCENS (by Jamie Griffin) appropriate for these data? 3) The guide to Survival Analysis by Cleves, Gould and Gutierrez suggests that, even if possibly different in nature, matematically, left censored data can be treated as interval censored. In my case, I would observe one interval for each worker (intervals are very long and transitions occur at the end of the interval). Therefore, can I estimate my model using a probit / logit / cloglog model? 4) Finally, would st setting my data only indicating right censoring invalidate non parametric analysis (specifically, KM and NA estimates)? Thanks, Vinicio ========================= Question 1 ---------- If you don't know what the start date of a spell is ('left consoring'), then you can't figure out elapsed duration. But most survival analysis models model the hazard rate (or log duration) as a function of elapsed duration. You're stuck. How does one cope with the lost information? I can think of 5 approaches: (1) somehow try and get the lost information (typically infeasible). Or use assumptions to substitute for data: (2) drop the left-censored spells -- the typical practice, at least in social science. This is usually tempered with the worry that these spells are typically relatively long, and so dropping them will lead to a form of selection bias in estimates. (See e.g. paper by John Iceland at http://www.psc.isr.umich.edu/pubs/pdf/rr97-378.pdf.) (3) assume that the hazard rate is constant (exponential hazard model for continuous time data; geometric for discrete time). In this case, the process doesn't depend on elapsed duration, and you can use the observed duration (it's a case in which left censoring turns into left truncation) -- so problem solved. But of course the assumption of constant hazard rate is likely to be unpalatable. (4) suppose that the hazard is constant at all elapsed durations greater than some threshold value T* (e.g. T* = 5 years) where T* chosen such that all left-censored spells are longer than T*. (You have to decide for yourself whether this is feasible in your situation.) Have a look at the article by Ann Huff Stevens ("Climbing Out of Poverty, Falling Back In: Measuring the Persistence of Poverty over Multiple Spells." Journal of Human Resources, Summer 1999.) She compares this strategy with strategy #2. She has a discrete time model and allows the baseline hazard to vary non-parametrically up to the threshold T* and then is fixed constant thereafter. Beware that I have not investigated this method in gory detail myself. (E.g. I haven't checked how she ensures that she has the correct number of person-months at risk of the event in her data set. And also how or whether the method can also allow frailty.) (5) integrate out over all possible start dates -- a very technically demanding approach (to me, anyway) -- and hence not done too often. For an example, see e.g. Gottshalk, Peter, and Robert A. Moffitt (1994), 'Welfare dependence: concepts, measures, and trends', American Economic Review, 84 (2), 38-42. Moffitt and Rendall have a paper that does similar things, I recall. The idea is, crudely, that you write down the probability (likelihood) of the spell conditional on the spell starting at some specific date (t0, say), and then 'integrate' out over all possible t0. This needs some assumptions about the distribution of the t0, and modelling this usually uses auxiliary information (another reason why it's not often done). See also, on related matters, Steve Nickell's paper on unemployment duration in Econometrica 1979. Question 2 ---------- -intcens- is a nice module for estimating parametric survival analysis models using interval-censored data (though does not allow time-varying covariates). For other approaches to interval-censored data, see my Survival Analysis MS and Lessons at my website (URL below). Question 3 ---------- I don't have the Cleves et al. reference to hand. But my response is, in effect, already given in my response to Q2. (In short, yes, there are approaches to modelling interval-censored data that utilize -logit- and -cloglog- etc.) Question 4 ---------- See response to Q1. Again, the issue is whether omission of left-censored observations leads to bias or not. Good luck Stephen ------------------------------------------------------------- Professor Stephen P. Jenkins <stephenj@essex.ac.uk> Institute for Social and Economic Research University of Essex, Colchester CO4 3SQ, U.K. Tel: +44 1206 873374. Fax: +44 1206 873151. http://www.iser.essex.ac.uk Survival Analysis using Stata: http://www.iser.essex.ac.uk/iser/teaching/module-ec968 Downloadable papers and software: http://ideas.repec.org/e/pje7.html * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Fwd: st: left censoring, survival analysis***From:*"V. Martini" <vinmar0@hotmail.com>

- Prev by Date:
**Re: st: 3D surface plot of joint density** - Next by Date:
**RE: error mesage convention [was: Re: st: Importing data with infile: Identifying records with problems]** - Previous by thread:
**st: left censoring, survival analysis** - Next by thread:
**Fwd: st: left censoring, survival analysis** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |