[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Yaseen Ghulam" <Yaseen.Ghulam@port.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Survival Analysis Issue |

Date |
Fri, 07 Nov 2003 10:54:28 -0000 |

Dear Stephen, Thank you very much for your help. Your notes are very helpful. As we understand from your reply, there are two options of dealing with left censoring in our case. 1. Assume that the hazard does not vary with time and drop the time variable and see what happens. 2. Drop those workers who joined before April 1996. Shabbar Jaffry Yaseen Ghulam st: Survival Analysis Issue From "Yaseen Ghulam" <Yaseen.Ghulam@port.ac.uk <mailto:Yaseen.Ghulam@port.ac.uk>> To statalist@hsphsun2.harvard.edu <mailto:statalist@hsphsun2.harvard.edu> Subject st: Survival Analysis Issue Date Wed, 05 Nov 2003 11:23:40 -0000 Dear Stata users, Currently we are working on a study which deals with workers behaviour in term of leaving the organisation pre maturely before their contract expires. Particularly, idea is to find who is likely to quit and when by using the past data. We will appreciate if someone can provide some help. The data we have is a typical organisational data. Let me briefly explain what data set we have. In our administrative data set we have persons-month data with monthly observations starting from April 1996 till July 2002 (75 monthly spells - time) for approx.73 thousand workers (3.39m cases) implying that these workers came to observation from April 1996 and stayed under observation till July 2002. Out of these 73 thousand workers during the observation period roughly 20 thousand quit the organisation prematurely (20 thousand fail cases). Remaining are right censored. In the dataset we also have individuals who joined before 1996 (observation window). However, we do not have information on those who joined before 1996 and left before 1996 (left censoring). Those who joined after 1996 and either stayed or left (delayed entry) before the end of observation period (July 2002) we have a complete data set about them. Our data set has the normal job related variables (e.g. what job they are performing etc.) and demographic variables (e.g. gender, marital status etc). We have introduced external factors (e.g. no of vacancies, claimant counts, manufacturing productivity index, inflation rate, manufacturing sector earning index etc.) into the data set. These time varying covariates have been merged with the above data set by calendar months (time). Our questions are: 1. Can STATA deal with both cases of left and right censoring and left truncation (delayed entry) simultaneously? 2. Should we be only using those workers who joined after Apr 1996 and throw away those cases who joined before 1996 (due to left censoring). 3. We would like to predict which worker is likely to leave and when. It means calculating probability of failure and expected time of failure for next few years for right censored workers on the basis of observation period data (April 1996 to July 2002). If right censored cases are many, does it effect the quality of predictions. I suppose these predictions should be limited to only next 6 years as our observation span is only for 6 years. Have anybody written any macros or programmes in Stata to carry out these predictions by considering the above mentioned issues and type of data we have using survival analysis framework? We highly appreciate the help. Shabbar Jaffry Yaseen Ghulam University of Portsmouth U.K. st: RE: Survival Analysis Issue From "Stephen P. Jenkins" <stephenj@essex.ac.uk <mailto:stephenj@essex.ac.uk>> To <statalist@hsphsun2.harvard.edu <mailto:statalist@hsphsun2.harvard.edu>> Subject st: RE: Survival Analysis Issue Date Thu, 6 Nov 2003 10:18:38 -0000 > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [<mailto:owner-statalist@hsphsun2.harvard.edu>] On Behalf Of > Yaseen Ghulam > Sent: 05 November 2003 11:24 > To: statalist@hsphsun2.harvard.edu > Subject: st: Survival Analysis Issue > > > Dear Stata users, > > Currently we are working on a study which deals with workers > behaviour > in term of leaving the organisation pre maturely before their > contract > expires. Particularly, idea is to find who is likely to quit > and when by > using the past data. We will appreciate if someone can > provide some help. > > The data we have is a typical organisational data. Let me > briefly explain > what data set we have. > > In our administrative data set we have persons-month data > with monthly > observations starting from April 1996 till July 2002 (75 > monthly spells - > time) for approx.73 thousand workers (3.39m cases) implying > that these > workers came to observation from April 1996 and stayed under > observation till July 2002. Out of these 73 thousand workers > during the > observation period roughly 20 thousand quit the organisation > prematurely (20 thousand fail cases). Remaining are right censored. > > In the dataset we also have individuals who joined before 1996 > (observation window). However, we do not have information on those > who joined before 1996 and left before 1996 (left censoring). > > Those who joined after 1996 and either stayed or left > (delayed entry) before the end of > observation period (July 2002) we have a complete data set > about them. > ... snip ... > > Our questions are: > > 1. Can STATA deal with both cases of left and right censoring > and left truncation (delayed entry) simultaneously? > 2. Should we be only using those workers who joined after Apr > 1996 and > throw away those cases who joined before 1996 (due to left > censoring). You have interval-censored (banded) survival time data, a.k.a. discrete time data. for which it is no problem at all to handle left-truncated data combined with right censoring. [Have a look at the lecture notes and Stata lessons at http://www.iser.essex.ac.uk/teaching/stephenj/ec968/index.php] Left-censored data is more problematic. It's straighforward to handle if you are prepared to assume that the hazard rate does not vary with survival time. That's a strong, probably unacceptable, assumption -- but you might want to see what happens. Otherwise the standard way of handling the left-censoring is to drop those spells. > 3. We would like to predict which worker is likely to leave > and when. It > means calculating probability of failure and expected time of > failure for > next few years for right censored workers on the basis of > observation > period data (April 1996 to July 2002). If right censored > cases are many, does it effect > the quality of predictions. I suppose these predictions should > be limited to only next 6 years as our observation span is > only for 6 > years. > Have anybody written any macros or programmes in Stata to > carry out these predictions > by considering the above mentioned issues and type of data we > have using survival > analysis framework? If you look at the lessons on discrete time models cited above, you'll see examples of Stata code showing how to do within-sample and out-of-sample predictions of the sort that you are asking about. Stephen ------------------------------------------------------------- Professor Stephen P. Jenkins <stephenj@essex.ac.uk> Institute for Social and Economic Research University of Essex, Colchester CO4 3SQ, U.K. Tel: +44 1206 873374. Fax: +44 1206 873151. <http://www.iser.essex.ac.uk> * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Making Graphs** - Next by Date:
**st: Stata --> PDFLaTeX, or, when will we get a PDF translator?** - Previous by thread:
**Re: st: Survival Analysis Issue** - Next by thread:
**Re: st: Survival Analysis Issue** - Index(es):

© Copyright 1996–2022 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |