Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: streg, enter and origin


From   "Stephen P. Jenkins" <[email protected]>
To   <[email protected]>
Subject   st: streg, enter and origin
Date   Fri, 5 May 2006 09:16:39 +0100

+++++++++++++++++++++++++++++++++++++++++++++

Date: Fri, 5 May 2006 08:25:35 +0200
From: Henrik L A <[email protected]>
Subject: st: streg, enter and origin

Dear Statalisters,

in writing my thesis, my (lack of) knowledge of Stata's -stset-  
function seem to have become a problem;  especially the options `enter'

and `origin' cause confusion. (I have consulted the ST manual many  
times, but that did not help.)

My data is a stock sample of a population that is followed from  
randomisation on 1 January 1992 until 1 January 2005. I have data for  
date of birth (in the range form 1927 until 1969) and date of death  
(for those who die).

For the survival times, I have generated a variable called `survival'  
that counts the days of survival for an observation from day 0  
(1.1.1992) until day 4,749 (1.1.2005). For the censoring/failure issue,

I have generated a dummy called `failure' that is equal to one for the  
observations who die, and zero otherwise. Finally, the month, day, and  
year of birth are stored in variables called `bm', `bd', and `by'.

The analysis I ultimately want to do is a Cox or a parametric  
regression with the likelihood function weighted by the survivor  
function to deal with the length-biased sampling issue. For this  
purpose I have -stset- my data like this:


. stset survival, failure(failure) origin(time mdy(bm,bd,by))  
enter(time mdy(1,1,1992))

<snip>

So, my question is if the procedure above is correct, and if not, if  
there is a better way to do the -stset-.
++++++++++++++++++++++++++++++++++++++++++++++++


To me, it looks as if the problem is the way you defined the survival
time variable ("survival"). People become at risk of dying when they are
born (age zero), and so the total time at risk should be from birth
until last observed (not since 1 Jan 1992).  Of course, and as you
correctly observe, the likelihood function for a death hazard regression
model needs to account for the stock sampling (known in other
disciplines as 'left truncation', or 'delayed entry') -- in short, one
needs to condition on the fact that, of all those born on a particular
date before 1 Jan 1962, only relatively long-livers survive until 1 Jan
1962. (Economists might also think of this as a particular type of
endogenous sample selection.)    

I too have found the difference between origin() and enter() options in
-stset- confusing, but a re-read of the "Key concepts" discussion (e.g.
version 9 manual, [ST], pp. 321-322) usually puts me back on track
quickly.

Stephen
-------------------------------------------------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research
University of Essex, Colchester CO4 3SQ, U.K.
Tel: +44 1206 873374.  Fax: +44 1206 873151.
http://www.iser.essex.ac.uk  
Survival Analysis using Stata:
http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/ 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index