Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Survival Analysis Issue

From   "Stephen P. Jenkins" <>
To   <>
Subject   st: RE: Survival Analysis Issue
Date   Thu, 6 Nov 2003 10:18:38 -0000

> -----Original Message-----
> From: 
> [] On Behalf Of 
> Yaseen Ghulam
> Sent: 05 November 2003 11:24
> To:
> Subject: st: Survival Analysis Issue
> Dear Stata users,   
> Currently we are working on a study which deals with workers 
> behaviour  
> in term of leaving the organisation pre maturely before their 
> contract  
> expires. Particularly, idea is to find who is likely to quit 
> and when by  
> using the past data.  We will appreciate if someone can 
> provide some help.
> The data we have is a typical organisational data. Let me 
> briefly explain  
> what data set we have.  
> In our administrative data set we have persons-month data 
> with monthly  
> observations starting from April 1996 till July 2002 (75 
> monthly spells -  
> time) for approx.73 thousand workers (3.39m cases) implying 
> that these  
> workers came to observation from April 1996 and stayed under  
> observation till July 2002. Out of these 73 thousand workers 
> during the  
> observation period roughly 20 thousand quit the organisation  
> prematurely (20 thousand fail cases). Remaining are right censored.   
> In the dataset we also have individuals who joined before 1996  
> (observation window). However, we do not have information on those  
> who joined before 1996 and left before 1996 (left censoring).  
> Those who joined after 1996 and either stayed or left 
> (delayed entry) before the end of  
> observation period (July 2002) we have a complete data set 
> about them. 
... snip ...
> Our questions are:  
> 1. Can STATA deal with both cases of left and right censoring  
> and left truncation (delayed entry) simultaneously?  
> 2. Should we be only using those workers who joined after Apr 
> 1996 and  
> throw away those cases who joined before 1996 (due to left 
> censoring). 

You have interval-censored (banded) survival time data, a.k.a. discrete
time data.
for which it is no problem at all to handle left-truncated data combined
with right censoring.
[Have a look at the lecture notes and Stata lessons at]

Left-censored data is more problematic. It's straighforward to handle if
you are prepared to assume that the hazard rate does not vary with
survival time. That's a strong, probably unacceptable, assumption -- but
you might want to see what happens.
Otherwise the standard way of handling the left-censoring is to drop
those spells.

> 3. We would like to predict which worker is likely to leave 
> and when. It  
> means calculating probability of failure and expected time of 
> failure for  
> next few years for right censored workers on the basis of 
> observation  
> period data (April 1996 to July 2002). If right censored 
> cases are many, does it effect 
> the quality of predictions. I suppose these predictions should  
> be limited to only next 6 years as our observation span is 
> only for 6  
> years.  
> Have anybody written any macros or programmes in Stata to 
> carry out these predictions 
> by considering the above mentioned issues and type of data we 
> have using survival 
> analysis framework? 

If you look at the lessons on discrete time models cited above, you'll
see examples of Stata code showing how to do within-sample and
out-of-sample predictions of the sort that you are asking about.

Professor Stephen P. Jenkins <>
Institute for Social and Economic Research
University of Essex, Colchester CO4 3SQ, U.K.
Tel: +44 1206 873374.  Fax: +44 1206 873151.   

*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index