Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Survival analysis question

From	"Feiveson, Alan H. (JSC-SK311)" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: Survival analysis question
Date	Thu, 4 Nov 2010 09:02:57 -0500
Hi Steve - Ok - I agree that in real life, the time-to-event starts over again on the second test. What I meant by "times building up consecutively" is that using the accumulated time within a subject (ttrxt) is a way of getting -stset- to properly interpret this as multiple-failure records. If I am interpreting the results from -stset- correctly, it reports exactly what I want - namely the line 

" obs. time interval:  (ttrxt[_n-1], ttrxt]"


seems to say to me that for a given subject, ttrxt is treated as a pseudo calendar time (ignoring the week in-between), not the failure time. The actual time-to-event starts counting over again at the previous value of ttrxt.

Thanks for the idea of doing a sanity check by using a paired t-test or regression ignoring the censoring. So when you did this, you got a "t-value" of -2.37 and a p-value of 0.049 using clustered SE's. 

When I run -stcox- with my version of the -stset- command, I get  a "z-value" of 2.05 (the sign is reversed because failure times are less and the hazard is increased for post = 1) with a p-value of 0.040. This seems to agree as much as one could expect when ignoring the censoring in the first analysis.

I also ran a parametric Weibull model with -streg-

. streg post,dist(weibull) cluster(id) nolog

         failure _d:  fail
   analysis time _t:  ttrxt
  exit on or before:  time .
                 id:  id

Weibull regression -- log relative-hazard form 

No. of subjects      =            8                Number of obs   =        16
No. of failures      =           13
Time at risk         =         5607
                                                   Wald chi2(1)    =      5.19
Log pseudolikelihood =   -11.631281                Prob > chi2     =    0.0227

                                     (Std. Err. adjusted for 8 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        post |    4.03776   2.473611     2.28   0.023     1.215284    13.41539
-------------+----------------------------------------------------------------
       /ln_p |   -.080116   .1865371    -0.43   0.668    -.4457219    .2854899
-------------+----------------------------------------------------------------
           p |   .9230093   .1721754                      .6403618    1.330414
         1/p |   1.083413   .2020966                      .7516459    1.561617
------------------------------------------------------------------------------

. predict mt,mean time

. tabdisp post,cell(mt)

-----------------------------
     post | Predicted mean _t
----------+------------------
        0 |          831.7354
        1 |          183.3516
-----------------------------

The predicted mean time-to-failure (MTF) again is in the ballpark of what you get when ignoring censoring:

. table post,con(mean t)

----------------------
     post |    mean(t)
----------+-----------
        0 |     491.25
        1 |    209.625
----------------------


For the post=0 case, the predicted MTF with proper account of censoring is higher (831.7 vs 491.3), but that's what it should be. For the post=1 case, where there is no censored data, the two estimates are quite close.

So I still think the way I did the -stset- is correct, but I may have mislead you by the cryptic statement "times building up consecutively".


Al











-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Steven Samuels
Sent: Wednesday, November 03, 2010 5:05 PM
To: [email protected]
Subject: Re: st: Survival analysis question


> Steve - I think there is a communication problem here. The event is  
> a subject reaching a state of presyncopy during an upright tilt.  
> Subjects are given the tilt test with Treatment 1 ("pre"), then one  
> week later they are given the test with Treatment 2 ("post").  
> Subjects aren't at risk during the week in between because they  
> aren't doing the tilt test. But I see there is no way you would know  
> this from the data alone. Therefore I would like to claim that in  
> effect "times" can be considered as building up consecutively. Does  
> this make sense?
>
> Al
>

It doesn't make sense to me, Al. Assume that there was no treatment  
(or that the treatments were the same). For the times to be considered  
as "building up consecutively,"  an individual's inherent survival  
curve for the second test would continue  where the first curve left  
off.  The length of time between the two tests make this very  
unlikely. Too many (unmeasured) factors  that affect response will  
differ between the tests. I think this would be true even if the tests  
were separated by just a few hours, though here issues of treatment  
order, carry-over, changed physiological state, and prior outcome  
would also enter.

Put it another way: Suppose you were measuring an outcome that was not  
censored. Wouldn't you do a standard paired-data analysis? Let's  
happens if I do this, ignoring the censoring,  and compare the results  
to those from a clustered regression of the individual times.

. bys subjectid: gen diff = time[2] - time[1]
. preserve
. bys subjectid: keep if _n==1
(8 observations deleted)

. mean diff   //paired analysis

Mean estimation                     Number of obs    =       8
--------------------------------------------------------------
              |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         diff |   -281.625   114.6071     -552.6277   -10.62231
--------------------------------------------------------------
. restore
  reg time treatment, cluster(subjectid)

Linear regression                                      Number of obs  
=      16
  [output skipped]
                              (Std. Err. adjusted for 8 clusters in  
subjectid)
------------------------------------------------------------------------------
              |               Robust
         time |      Coef.   Std. Err.      t    P>|t|     [95% Conf.  
Interval]
------------- 
+----------------------------------------------------------------
    treatment |   -281.625   118.6296    -2.37   0.049    -562.1394    
-1.110568
        _cons |     491.25   133.6418     3.68   0.008     175.2374     
807.2626
------------------------------------------------------------------------------

The point estimates are the same, and the standard errors are close.   
(In fact, if you jackknife the clusters, the standard errors are  
identical.)   By analogy, clustered -stcox- on the individual times is  
the way to go. The fact that you can't get sensible survival curves  
for your approach just reinforces this conclusion.

Steve




-----Original Message-----
From: [email protected] [mailto:[email protected] 
] On Behalf Of Steven Samuels
Sent: Wednesday, November 03, 2010 2:40 PM
To: [email protected]
Subject: Re: st: Survival analysis question

--


Al,

I don't think that the two times are consecutive: they are recorded as
seconds, but the the two observations on each subject were separated
by a week.

Steve

On Nov 3, 2010, at 2:50 PM, Feiveson, Alan H. (JSC-SK311) wrote:

Steve - In my opinion this is multiple failure data. Each subject is
subjected to two consecutive exposures, and a subject can "fail" on
none, either, or both of these tests. So the variable ttrxt at a given
observation is the total time that the particular subject has been at
risk up through that observation. Therefore I think the stset command

. stset ttrxt, id(id) failure(fail) exit(time .)

                id:  id
     failure event:  fail != 0 & fail < .
obs. time interval:  (ttrxt[_n-1], ttrxt]
exit on or before:  time .

------------------------------------------------------------------------------
       16  total obs.
        0  exclusions
------------------------------------------------------------------------------
       16  obs. remaining, representing
        8  subjects
       13  failures in multiple failure-per-subject data
     5607  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =      1198

is correct. I agree that ideally, one should try a frailty model on
this data, but it doesn't work well with only 8 subjects.

Al Feiveson




-----Original Message-----
From: [email protected] [mailto:[email protected]
] On Behalf Of Steven Samuels
Sent: Wednesday, November 03, 2010 12:35 PM
To: [email protected]
Subject: Re: st: Survival analysis question


Chris Westby:


You don't have multiple-failure data, because the start time for the
two tests should be zero. The correct statement is:

stset t, failure(fail)

This will change the -stcox- results as well. Also try -stsum,
by(treatment)- after the two versions of -stset--.  I suggest that you
consider the -shared-  option in -stcox- to allow for the possibility
of person-specific baseline hazards. Note that eight subjects is
probably not enough for the standard errors to be reliable.


Steve

Steven J. Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783


On Nov 3, 2010, at 8:35 AM, Westby, Christian Michael. (JSC-SK)[USRA]
wrote:

Dear Statalisters,

I am working on comparing survival times in one group of subjects
before and after treatment and am having a hard time with the "stset"
code.


Using the following data set where testing was separated by 1 week, t
is time of task before and after treatment (seconds) and ttrxt is time
calculated to prevent time from being treated as continuous and fail
is 0=completed, 1=not completed.



subjectid	treatment	fail			t	ttrxt
-----------------------------------------------------------------
1		pre		failed		169	169
1		post		failed		141	310
2		pre		failed		114	114
2		post		failed		84	198
3		pre		failed		564	564
3		post		failed		296	860
4		pre		failed		168	168
4		post		failed		332	500
5		pre		failed		215	215
5		post		failed		50	265
6		pre		completed		900	900
6		post		failed		196	1096
7		pre		completed		900	900
7		post		failed		298	1198
8		pre		completed		900	900
8		post		failed		280	1180
-----------------------------------------------------------------


I used


. stset ttrxt, id(subjectid) failure(fail) exit(time .)


id:  subjectid
failure event:  fail != 0 & fail < .
obs. time interval:  (ttrxt[_n-1], ttrxt]  exit on or before:  time .

------------------------------------------------------------------------------
       16  total obs.
        0  exclusions
------------------------------------------------------------------------------
       16  obs. remaining, representing
        8  subjects
       13  failures in multiple failure-per-subject data
     5607  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =      1198


I then ran


. stcox treatment, cluster(subjectid)

         failure _d:  fail
   analysis time _t:  ttrxt
  exit on or before:  time .
                 id:  subjectid

Iteration 0:   log pseudolikelihood = -20.175132
Iteration 1:   log pseudolikelihood = -18.079165
Iteration 2:   log pseudolikelihood = -18.026011
Iteration 3:   log pseudolikelihood = -18.025935
Refining estimates:
Iteration 0:   log pseudolikelihood = -18.025935

Cox regression -- no ties

No. of subjects      =            8                Number of obs
=        16
No. of failures      =           13
Time at risk         =         5607
                                                   Wald chi2(1)
=      4.22
Log pseudolikelihood =   -18.025935                Prob > chi2
=    0.0399

                              (Std. Err. adjusted for 8 clusters in
subjectid)
------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
-------------+------
   treatment |   4.610013   3.428317     2.05   0.040     1.073226
19.80218
------------------------------------------------------------------------------


I believe that the output and results are accurate however, I am
unable to get Stata to correctly graph the survival curves using the
following code



. stcurv, surv at1(treatment=0) at2(treatment=1)


the resulting graph incorrectly plots both groups starting at less
than 100% at a time=0 and the x-axis scale is incorrect.


Any thoughts?



Chris


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: Survival analysis question
  - From: "Westby, Christian Michael. (JSC-SK)[USRA]" <[email protected]>
- Re: st: Survival analysis question
  - From: Steven Samuels <[email protected]>
- RE: st: Survival analysis question
  - From: "Feiveson, Alan H. (JSC-SK311)" <[email protected]>
- Re: st: Survival analysis question
  - From: Steven Samuels <[email protected]>
- RE: st: Survival analysis question
  - From: "Feiveson, Alan H. (JSC-SK311)" <[email protected]>
- Re: st: Survival analysis question
  - From: Steven Samuels <[email protected]>
Prev by Date: st: endogeneity in logistic regression
Next by Date: Re: st:how to keep the overlapping variables as many as possible when combining data sets
Previous by thread: Re: st: Survival analysis question
Next by thread: Re: st: Survival analysis question
Index(es):
- Date
- Thread