Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Survival Analysis Issue


From   "Yaseen Ghulam" <Yaseen.Ghulam@port.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Survival Analysis Issue
Date   Tue, 11 Nov 2003 10:17:20 -0000

Dear Stephen and other stata specialists,

First of all sorry for double posting and thank you very much for your help. Your notes 
on survival analysis are great and very helpful.
As we understand from your reply, there are two options of dealing with 
left censoring in our case and do you agree with us.

1. Assume that the hazard does not vary with time and drop the time 
variable and see what happens.
2. Drop those workers who joined before April 1996. 

Further, once model is estimated through discrete time method, what is the way where 
one can check that the in-sample predictions model has made at individual level are 
correct through predicted survival or hazard.

Shabbar Jaffry
Yaseen Ghulam
st: Survival Analysis Issue


> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu 
> [<mailto:owner-statalist@hsphsun2.harvard.edu>] On Behalf Of 
> Yaseen Ghulam
> Sent: 05 November 2003 11:24
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Survival Analysis Issue
> 
> 
> Dear Stata users,   
> 
> Currently we are working on a study which deals with workers 
> behaviour  
> in term of leaving the organisation pre maturely before their 
> contract  
> expires. Particularly, idea is to find who is likely to quit 
> and when by  
> using the past data.  We will appreciate if someone can 
> provide some help.
> 
> The data we have is a typical organisational data. Let me 
> briefly explain  
> what data set we have.  
> 
> In our administrative data set we have persons-month data 
> with monthly  
> observations starting from April 1996 till July 2002 (75 
> monthly spells -  
> time) for approx.73 thousand workers (3.39m cases) implying 
> that these  
> workers came to observation from April 1996 and stayed under  
> observation till July 2002. Out of these 73 thousand workers 
> during the  
> observation period roughly 20 thousand quit the organisation  
> prematurely (20 thousand fail cases). Remaining are right censored.   
> 
> In the dataset we also have individuals who joined before 1996  
> (observation window). However, we do not have information on those  
> who joined before 1996 and left before 1996 (left censoring).  
> 
> Those who joined after 1996 and either stayed or left 
> (delayed entry) before the end of  
> observation period (July 2002) we have a complete data set 
> about them. 
>   
... snip ...
> 
> Our questions are:  
> 
> 1. Can STATA deal with both cases of left and right censoring  
> and left truncation (delayed entry) simultaneously?  
> 2. Should we be only using those workers who joined after Apr 
> 1996 and  
> throw away those cases who joined before 1996 (due to left 
> censoring). 

You have interval-censored (banded) survival time data, a.k.a. discrete
time data.
for which it is no problem at all to handle left-truncated data combined
with right censoring.
[Have a look at the lecture notes and Stata lessons at
http://www.iser.essex.ac.uk/teaching/stephenj/ec968/index.php]

Left-censored data is more problematic. It's straighforward to handle if
you are prepared to assume that the hazard rate does not vary with
survival time. That's a strong, probably unacceptable, assumption -- 
but
you might want to see what happens.
Otherwise the standard way of handling the left-censoring is to drop
those spells.

> 3. We would like to predict which worker is likely to leave 
> and when. It  
> means calculating probability of failure and expected time of 
> failure for  
> next few years for right censored workers on the basis of 
> observation  
> period data (April 1996 to July 2002). If right censored 
> cases are many, does it effect 
> the quality of predictions. I suppose these predictions should  
> be limited to only next 6 years as our observation span is 
> only for 6  
> years.  
> Have anybody written any macros or programmes in Stata to 
> carry out these predictions 
> by considering the above mentioned issues and type of data we 
> have using survival 
> analysis framework? 

If you look at the lessons on discrete time models cited above, you'll
see examples of Stata code showing how to do within-sample and
out-of-sample predictions of the sort that you are asking about.


Stephen
-------------------------------------------------------------
Professor Stephen P. Jenkins <stephenj@essex.ac.uk>
Institute for Social and Economic Research
University of Essex, Colchester CO4 3SQ, U.K.
Tel: +44 1206 873374.  Fax: +44 1206 873151.
<http://www.iser.essex.ac.uk>   




*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index