Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: streg, enter and origin


From   Henrik L A <henrikla@undermybed.dk>
To   statalist@hsphsun2.harvard.edu
Subject   st: streg, enter and origin
Date   Fri, 5 May 2006 08:25:35 +0200

Dear Statalisters,

in writing my thesis, my (lack of) knowledge of Stata's -stset- function seem to have become a problem; especially the options `enter' and `origin' cause confusion. (I have consulted the ST manual many times, but that did not help.)

My data is a stock sample of a population that is followed from randomisation on 1 January 1992 until 1 January 2005. I have data for date of birth (in the range form 1927 until 1969) and date of death (for those who die).

For the survival times, I have generated a variable called `survival' that counts the days of survival for an observation from day 0 (1.1.1992) until day 4,749 (1.1.2005). For the censoring/failure issue, I have generated a dummy called `failure' that is equal to one for the observations who die, and zero otherwise. Finally, the month, day, and year of birth are stored in variables called `bm', `bd', and `by'.

The analysis I ultimately want to do is a Cox or a parametric regression with the likelihood function weighted by the survivor function to deal with the length-biased sampling issue. For this purpose I have -stset- my data like this:

. stset survival, failure(failure) origin(time mdy(bm,bd,by)) enter(time mdy(1,1,1992))

failure event: failure != 0 & failure < .
obs. time interval: (origin, survival]
enter on or after: time mdy(1,1,1992)
exit on or before: failure
t for analysis: (time-origin)
origin: time mdy(bm,bd,by)

------------------------------------------------------------------------ ------
92348 total obs.
0 exclusions
------------------------------------------------------------------------ ------
92348 obs. remaining, representing
4717 failures in single record/single failure data
4.29e+08 total analysis time at risk, at risk from t = 0
earliest observed entry t = 8037
last observed exit t = 28485

Above I have added a constant of 11,688 to the `survival'-variable, because the observations born after 1960 were excluded in a earlier version of my -stset- (11688 is = mdy(1,1,1992)); presumably because 1.1.1960 means 0 to Stata and observations born after 1960 then ended up with negative survival times when I introduced `origin' (because t = `survival' - origin).

So, my question is if the procedure above is correct, and if not, if there is a better way to do the -stset-.



Kind regards,

Henrik Lindegaard
Aarhus, Denmark

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index