Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: survival analysis question (multiple failure and not multiple failure) |

Date |
Fri, 5 Nov 2010 16:28:19 -0500 |

Hi Steve - I re-read the discussion on multiple failures in the Stata FAQ and what did I see for Sec. 3.1.1 (unordered failure events of the same type)? ============================================================================== Each patient has two observations in the dataset, one for the treated eye (treat=1) and another for the "control" eye, treat=0. The data, therefore, contain 394 observations. Each eye is assumed to enter the study at time 0 and it is followed until blindness develops or censoring occurs. The follow-up time is given by the variable time. The four observations listed above correspond to patients with id=5 and id=14. After creating the dataset, it is then stset as usual. The id() option, however, is not specified. Specifying id() would cause stset to interpret subjects with the same id() as the same sampling unit and would drop them because of overlapping study times. Thus, we type . stset time, failure(cens) ====================================================================== This is exactly our design (not 3.2.4) and thus is consistent with your original -stset- suggestion: . stset t,fail(fail) I think the confusion on my part was caused by not realizing that to set this up properly, one must make Stata "think" Chris' data is not multiple failure data by not using the id() option in -stset-, and then analyzing with cluster(id) or shared(id). I kept trying to find a way to set up the analysis using the id() option, thinking it was necessary. Thus after the above -stset- command, one gets the message: " 13 failures in single record/single failure data" when in fact, I would say this is indeed multiple-failure data. Even by Stata's terminology, this design is considered multiple-failure, if for no other reason than the discussion of it occurs in the above FAQ in Section 3.1.1. Of course, Stata doesn't know any better when we use -stset- without the id() option. So while I disagree with the first part of your original statement, "You don't have multiple-failure data, because the start time for the two tests should be zero"...etc I finally see why your recommendation is the correct way to go in Stata. Steve, thanks for all the time you have spent discussing this issue! Al I think a lot of the confusion (on my part, anyway) was that when you do -stset- without the id() option, the output says " 13 failures in single record/single failure data" whereas Technically, I would still call this "multiple failure data" (in fact this model appears in the "multiple failure" section . stset t,fail(fail) failure event: fail != 0 & fail < . obs. time interval: (0, t] exit on or before: failure ------------------------------------------------------------------------------ 16 total obs. 0 exclusions ------------------------------------------------------------------------------ 16 obs. remaining, representing 13 failures in single record/single failure data 5607 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 900 which does not include the "id" option, and then as you suggest analyze with the -cluster- option: . stcox post,cluster(id) robust nolog failure _d: fail analysis time _t: t Cox regression -- no ties No. of subjects = 16 Number of obs = 16 No. of failures = 13 Time at risk = 5607 Wald chi2(1) = 4.04 Log pseudolikelihood = -27.449277 Prob > chi2 = 0.0443 (Std. Err. adjusted for 8 clusters in id) ------------------------------------------------------------------------------ | Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- post | 2.762629 1.395949 2.01 0.044 1.026154 7.437593 ------------------------------------------------------------------------------ "You don't have multiple-failure data, because the start time for the two tests should be zero..." is correct - at least in Stata lingo. In is correct. -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Steven Samuels Sent: Thursday, November 04, 2010 2:59 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Survival analysis question Al- I was incorrect on (at least!) one point: your -stset-- does indeed give the correct time at risk, but the results of -stsum-, - stcox-, and -stcurve- still differ from those that use the -stset-- formula in 3.2.4 Steve sjsamuels@gmail.com Al- I think that the appropriate model is the "conditional risk set model (time from the previous event) of Section 3.2.4 at http://www.stata.com/support/faqs/stat/stmfail.html#cond2 . The -stset- for that model excludes the -id()- option. If you omit "id(id)" from your -stset- statement, you get the correct total analysis time for the first two ids.. For all eight subjects, the results of -stset- followed by -stsum-, -stcox- and -stcurve- are the identical for: stset time, failure(fail) exit(time .) time0(time0) //the formula of sec 3.2.4 in the FAQ and stset time, failure(fail) //my original suggestion Steve For the -stcox- analysis of also gives the correct total failure time and On Nov 4, 2010, at 1:23 PM, Feiveson, Alan H. (JSC-SK311) wrote: Hi Steve - OK - So I tried what was suggested in the link. To make this really simple I just did -stset- for the first two id's (with all failures): . gen time0=0 . list id treat post fail t ttrxt time0 if id<=2 ,sepby(id) +------------------------------------------------+ | id treat post fail t ttrxt time0 | |------------------------------------------------| 1. | 1 pre 0 1 169 169 0 | 2. | 1 post 1 1 141 310 0 | |------------------------------------------------| 3. | 2 pre 0 1 114 114 0 | 4. | 2 post 1 1 84 198 0 | +------------------------------------------------+ . stset t, id(id) failure(fail) exit(time .) enter(time0) if(id<=2) id: id failure event: fail != 0 & fail < . obs. time interval: (t[_n-1], t] enter on or after: time time0 exit on or before: time . if: id<=2 ------------------------------------------------------------------------------ 16 total obs. 12 ignored per request (if(), etc.) ------------------------------------------------------------------------------ 4 obs. remaining, representing 2 subjects 4 failures in multiple failure-per-subject data 283 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 169 But the total time at risk should be 169 + 141 + 114 + 84 = 508 (not 283). Note 283 = 169 + 114 is the sum of the "pre" failure times. Now, I redefine my "time0" variable to be where the previous test left off and use the cumulated time as the time variable: . replace time0=ttrxt[_n-1] if post==1 (8 real changes made) . stset ttrxt, id(id) failure(fail) exit(time .) enter(time0) if(id<=2) id: id failure event: fail != 0 & fail < . obs. time interval: (ttrxt[_n-1], ttrxt] enter on or after: time time0 exit on or before: time . if: id<=2 ------------------------------------------------------------------------------ 16 total obs. 12 ignored per request (if(), etc.) ------------------------------------------------------------------------------ 4 obs. remaining, representing 2 subjects 4 failures in multiple failure-per-subject data 508 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 310 and I get the correct total time at risk. However, equivalently, I could do what I did before without the "enter(time0)": . stset ttrxt, id(id) failure(fail) exit(time .) if(id<=2) id: id failure event: fail != 0 & fail < . obs. time interval: (ttrxt[_n-1], ttrxt] exit on or before: time . if: id<=2 ------------------------------------------------------------------------------ 16 total obs. 12 ignored per request (if(), etc.) ------------------------------------------------------------------------------ 4 obs. remaining, representing 2 subjects 4 failures in multiple failure-per-subject data 508 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 310 and I still get the correct time at risk. Am I missing something? Shouldn't the total time at risk just be the sum of the "t's"? Al -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu ] On Behalf Of Steven Samuels Sent: Thursday, November 04, 2010 11:05 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Survival analysis question -- -Al and Chris: I should correct a previous statement of mine. You do formally multiple-failure data, with a not-at-risk gap between test dates. But I think that the proper analysis is a "time from previous entry" as in http://www.stata.com/support/faqs/stat/stmfail.html#cond2 , Section 3.2.4. The approach there of putting the second test data into a separate stratum won't work, because you want to compare the first and second times. Steve > Steve - I think there is a communication problem here. The event is > a subject reaching a state of presyncopy during an upright tilt. > Subjects are given the tilt test with Treatment 1 ("pre"), then one > week later they are given the test with Treatment 2 ("post"). > Subjects aren't at risk during the week in between because they > aren't doing the tilt test. But I see there is no way you would know > this from the data alone. Therefore I would like to claim that in > effect "times" can be considered as building up consecutively. Does > this make sense? > > Al > It doesn't make sense to me, Al. Assume that there was no treatment (or that the treatments were the same). For the times to be considered as "building up consecutively," an individual's inherent survival curve for the second test would continue where the first curve left off. The length of time between the two tests make this very unlikely. Too many (unmeasured) factors that affect response will differ between the tests. I think this would be true even if the tests were separated by just a few hours, though here issues of treatment order, carry-over, changed physiological state, and prior outcome would also enter. Put it another way: Suppose you were measuring an outcome that was not censored. Wouldn't you do a standard paired-data analysis? Let's happens if I do this, ignoring the censoring, and compare the results to those from a clustered regression of the individual times. . bys subjectid: gen diff = time[2] - time[1] . preserve . bys subjectid: keep if _n==1 (8 observations deleted) . mean diff //paired analysis Mean estimation Number of obs = 8 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ diff | -281.625 114.6071 -552.6277 -10.62231 -------------------------------------------------------------- . restore reg time treatment, cluster(subjectid) Linear regression Number of obs = 16 [output skipped] (Std. Err. adjusted for 8 clusters in subjectid) ------------------------------------------------------------------------------ | Robust time | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------------------------------------------------------- treatment | -281.625 118.6296 -2.37 0.049 -562.1394 -1.110568 _cons | 491.25 133.6418 3.68 0.008 175.2374 807.2626 ------------------------------------------------------------------------------ The point estimates are the same, and the standard errors are close. (In fact, if you jackknife the clusters, the standard errors are identical.) By analogy, clustered -stcox- on the individual times is the way to go. The fact that you can't get sensible survival curves for your approach just reinforces this conclusion. Steve -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu ] On Behalf Of Steven Samuels Sent: Wednesday, November 03, 2010 2:40 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Survival analysis question -- Al, I don't think that the two times are consecutive: they are recorded as seconds, but the the two observations on each subject were separated by a week. Steve On Nov 3, 2010, at 2:50 PM, Feiveson, Alan H. (JSC-SK311) wrote: Steve - In my opinion this is multiple failure data. Each subject is subjected to two consecutive exposures, and a subject can "fail" on none, either, or both of these tests. So the variable ttrxt at a given observation is the total time that the particular subject has been at risk up through that observation. Therefore I think the stset command . stset ttrxt, id(id) failure(fail) exit(time .) id: id failure event: fail != 0 & fail < . obs. time interval: (ttrxt[_n-1], ttrxt] exit on or before: time . ------------------------------------------------------------------------------ 16 total obs. 0 exclusions ------------------------------------------------------------------------------ 16 obs. remaining, representing 8 subjects 13 failures in multiple failure-per-subject data 5607 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 1198 is correct. I agree that ideally, one should try a frailty model on this data, but it doesn't work well with only 8 subjects. Al Feiveson -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu ] On Behalf Of Steven Samuels Sent: Wednesday, November 03, 2010 12:35 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Survival analysis question Chris Westby: You don't have multiple-failure data, because the start time for the two tests should be zero. The correct statement is: stset t, failure(fail) This will change the -stcox- results as well. Also try -stsum, by(treatment)- after the two versions of -stset--. I suggest that you consider the -shared- option in -stcox- to allow for the possibility of person-specific baseline hazards. Note that eight subjects is probably not enough for the standard errors to be reliable. Steve Steven J. Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 On Nov 3, 2010, at 8:35 AM, Westby, Christian Michael. (JSC-SK)[USRA] wrote: Dear Statalisters, I am working on comparing survival times in one group of subjects before and after treatment and am having a hard time with the "stset" code. Using the following data set where testing was separated by 1 week, t is time of task before and after treatment (seconds) and ttrxt is time calculated to prevent time from being treated as continuous and fail is 0=completed, 1=not completed. subjectid treatment fail t ttrxt ----------------------------------------------------------------- 1 pre failed 169 169 1 post failed 141 310 2 pre failed 114 114 2 post failed 84 198 3 pre failed 564 564 3 post failed 296 860 4 pre failed 168 168 4 post failed 332 500 5 pre failed 215 215 5 post failed 50 265 6 pre completed 900 900 6 post failed 196 1096 7 pre completed 900 900 7 post failed 298 1198 8 pre completed 900 900 8 post failed 280 1180 ----------------------------------------------------------------- I used . stset ttrxt, id(subjectid) failure(fail) exit(time .) id: subjectid failure event: fail != 0 & fail < . obs. time interval: (ttrxt[_n-1], ttrxt] exit on or before: time . ------------------------------------------------------------------------------ 16 total obs. 0 exclusions ------------------------------------------------------------------------------ 16 obs. remaining, representing 8 subjects 13 failures in multiple failure-per-subject data 5607 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 1198 I then ran . stcox treatment, cluster(subjectid) failure _d: fail analysis time _t: ttrxt exit on or before: time . id: subjectid Iteration 0: log pseudolikelihood = -20.175132 Iteration 1: log pseudolikelihood = -18.079165 Iteration 2: log pseudolikelihood = -18.026011 Iteration 3: log pseudolikelihood = -18.025935 Refining estimates: Iteration 0: log pseudolikelihood = -18.025935 Cox regression -- no ties No. of subjects = 8 Number of obs = 16 No. of failures = 13 Time at risk = 5607 Wald chi2(1) = 4.22 Log pseudolikelihood = -18.025935 Prob > chi2 = 0.0399 (Std. Err. adjusted for 8 clusters in subjectid) ------------------------------------------------------------------------------ | Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- -------------+------ treatment | 4.610013 3.428317 2.05 0.040 1.073226 19.80218 ------------------------------------------------------------------------------ I believe that the output and results are accurate however, I am unable to get Stata to correctly graph the survival curves using the following code . stcurv, surv at1(treatment=0) at2(treatment=1) the resulting graph incorrectly plots both groups starting at less than 100% at a time=0 and the x-axis scale is incorrect. Any thoughts? Chris * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Survival analysis question***From:*"Westby, Christian Michael. (JSC-SK)[USRA]" <christian.westby@nasa.gov>

**Re: st: Survival analysis question***From:*Steven Samuels <sjsamuels@gmail.com>

**RE: st: Survival analysis question***From:*"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>

**Re: st: Survival analysis question***From:*Steven Samuels <sjsamuels@gmail.com>

**RE: st: Survival analysis question***From:*"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>

**Re: st: Survival analysis question***From:*Steven Samuels <sjsamuels@gmail.com>

**RE: st: Survival analysis question***From:*"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>

**Re: st: Survival analysis question***From:*Steven Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: Rép. : Re: st: Inequality constraints in cnsreg** - Next by Date:
**Re: Rép. : Re: st: Inequality constraints in cnsreg** - Previous by thread:
**Re: st: Survival analysis question** - Next by thread:
**st: "Best" command to output regression results** - Index(es):