Stata: Data Analysis and Statistical Software
   >> Home >> Products >> Capabilities >> Survival example
order stataorder stata

Survival example

The input data for the survival-analysis commands are duration records: each observation records a span of time over which the subject was observed, along with an outcome at the end of the period. There can be one record per subject or, if covariates vary over time, multiple records.

You can obtain simple descriptions:

. webuse cancer
(Patient Survival in Drug Trial)

. stset study died

     failure event:  died != 0 & died < .
obs. time interval:  (0, studytime]
 exit on or before:  failure

------------------------------------------------------------------------------
       48  total obs.
        0  exclusions
------------------------------------------------------------------------------
       48  obs. remaining, representing
       31  failures in single record/single failure data
      744  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =        39

. rename drug dose

. label define dosage 1 "Control" 2 "5 mg" 3 "10 mg"

. label values dose dosage

. stdescribe

                                   |-------------- per subject --------------|
Category                   total        mean         min     median        max
------------------------------------------------------------------------------
no. of subjects               48   
no. of records                48           1           1          1          1

(first) entry time                         0           0          0          0
(final) exit time                       15.5           1       12.5         39

subjects with gap              0   
time on gap if gap             0   
time at risk                 744        15.5           1       12.5         39

failures                      31    .6458333           0          1          1
------------------------------------------------------------------------------
. stsum, by(dose)

         |               incidence       no. of    |------ Survival time -----|
dose     | time at risk     rate        subjects        25%       50%       75%
---------+---------------------------------------------------------------------
 Control |          180   .1055556            20          4         8        12
    5 mg |          209   .0287081            14         13        22        23
   10 mg |          355   .0169014            14         25        33         .
---------+---------------------------------------------------------------------
   total |          744   .0416667            48          8        17        33

You can also compare the survivor functions

. sts list, by(dose) compare

                 -----Survivor Function------
dose            Control       5 mg      10 mg
---------------------------------------------
time       1     0.9000     1.0000     1.0000
           5     0.6000     1.0000     1.0000
           9     0.4500     0.8512     0.9286
          13     0.2250     0.7448     0.8571
          17     0.1125     0.6207     0.8571
          21     0.1125     0.6207     0.8571
          25          .     0.2069     0.6857
          29          .     0.2069     0.5878
          33          .          .     0.4408
          37          .          .     0.4408
          41          .          .          .
---------------------------------------------

or you can review the complete life table:

. sts list, by(dose)

           Beg.          Net            Survivor      Std.
  Time    Total   Fail   Lost           Function     Error     [95% Conf. Int.]
-------------------------------------------------------------------------------
Control 
     1       20      2      0             0.9000    0.0671     0.6560    0.9740
     2       18      1      0             0.8500    0.0798     0.6038    0.9490
     3       17      1      0             0.8000    0.0894     0.5511    0.9198
     4       16      2      0             0.7000    0.1025     0.4505    0.8525
(output omitted)
    23        1      1      0             0.0000         .          .         .
5 mg 
     6       14      1      1             0.9286    0.0688     0.5908    0.9896
     7       12      1      0             0.8512    0.0973     0.5234    0.9607
     9       11      0      1             0.8512    0.0973     0.5234    0.9607
(output omitted)
    32        1      0      1             0.2069    0.1769     0.0104    0.5804
10 mg 
     6       14      1      0             0.9286    0.0688     0.5908    0.9896
    10       13      1      0             0.8571    0.0935     0.5394    0.9622
    17       12      0      1             0.8571    0.0935     0.5394    0.9622
(output omitted)
    39        1      0      1             0.4408    0.1673     0.1312    0.7187
-------------------------------------------------------------------------------

Just as easily, you can obtain a graph

. sts graph, by(dose)
Kaplan–Meier graph

or test the equality of the survivor functions:

. sts test dose

         failure _d:  died
   analysis time _t:  studytime


Log-rank test for equality of survivor functions

        |   Events         Events
dose    |  observed       expected
--------+-------------------------
Control |        19           7.25
5 mg    |         6           8.20
10 mg   |         6          15.56
--------+-------------------------
Total   |        31          31.00

              chi2(2) =      30.19
              Pr>chi2 =     0.0000

We used the log-rank test, but we could have specified the Wilcoxon–Breslow–Gehan test, the Tarone–Ware test, the Peto–Peto–Prentice test, or the Fleming–Harrington test.

We could also perform stratified versions of these tests to control for an external covariate:

. generate agecat = 1

. replace agecat = 2 if age > 55
(25 real changes made)

. replace agecat = 3 if age > 60
(11 real changes made)

. tabulate agecat

     agecat |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         23       47.92       47.92
          2 |         14       29.17       77.08
          3 |         11       22.92      100.00
------------+-----------------------------------
      Total |         48      100.00

. sts test dose, strata(agecat)

Stratified log-rank test for equality of survivor functions

        |   Events         Events
dose    |  observed    expected(*)
--------+-------------------------
Control |        19           7.37
5 mg    |         6           9.67
10 mg   |         6          13.95
--------+-------------------------
Total   |        31          31.00

(*) sum over calculations within agecat

              chi2(2) =      29.46
              Pr>chi2 =     0.0000

Stata can fit Cox proportional hazards, exponential, Weibull, Gompertz, lognormal, log-logistic, and gamma models. In the case of the Cox proportional hazards model,

  • simple and stratified estimates are available
  • right censoring, left truncation (delayed entry), intermediary gaps are allowed
  • conventional and robust estimates of variance are available (Lin and Wei 1989)

The same is true of the parametric models. For exponential and Weibull models, estimates are available in either the accelerated-time or hazard metric.

Here we will focus on the Cox proportional hazards model using a model fitted on our dose–age data that we described above:

. stcox age i.dose

         failure _d:  died
   analysis time _t:  studytime

Iteration 0:   log likelihood = -99.911448
Iteration 1:   log likelihood = -82.331523
Iteration 2:   log likelihood = -81.676487
Iteration 3:   log likelihood = -81.652584
Iteration 4:   log likelihood = -81.652567
Refining estimates:
Iteration 0:   log likelihood = -81.652567

Cox regression -- Breslow method for ties

No. of subjects =           48                     Number of obs   =        48
No. of failures =           31
Time at risk    =          744
                                                   LR chi2(3)      =     36.52
Log likelihood  =   -81.652567                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   1.118334   .0409074     3.06   0.002     1.040963    1.201455
             |
        dose |
          2  |   .1805839   .0892742    -3.46   0.001     .0685292    .4758636
          3  |   .0520066    .034103    -4.51   0.000     .0143843    .1880305
------------------------------------------------------------------------------

By default, stcox uses Breslow’s method for handling ties and presents results as hazard ratios—that is, exponentiated coefficients—but we can see the underlying coefficients if we wish:

. stcox, nohr

Cox regression -- Breslow method for ties

No. of subjects =           48                     Number of obs   =        48
No. of failures =           31
Time at risk    =          744
                                                   LR chi2(3)      =     36.52
Log likelihood  =   -81.652567                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |     .11184   .0365789     3.06   0.002     .0401467    .1835333
             |
        dose |
          2  |   -1.71156   .4943639    -3.46   0.001    -2.680495   -.7426241
          3  |  -2.956384   .6557432    -4.51   0.000    -4.241617   -1.671151
------------------------------------------------------------------------------

stcox can also handle ties using Efron’s method, the exact partial-likelihood method, or the exact marginal-likelihood method.

We can as easily fit the model with robust estimates of variance:

. stcox age i.dose, vce(robust)

         failure _d:  died
   analysis time _t:  studytime

Iteration 0:   log pseudolikelihood = -99.911448
Iteration 1:   log pseudolikelihood = -82.331523
Iteration 2:   log pseudolikelihood = -81.676487
Iteration 3:   log pseudolikelihood = -81.652584
Iteration 4:   log pseudolikelihood = -81.652567
Refining estimates:
Iteration 0:   log pseudolikelihood = -81.652567

Cox regression -- Breslow method for ties

No. of subjects      =           48                Number of obs   =        48
No. of failures      =           31
Time at risk         =          744
                                                   Wald chi2(3)    =     32.39
Log pseudolikelihood =   -81.652567                Prob > chi2     =    0.0000

------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   1.118334   .0327643     3.82   0.000     1.055926     1.18443
             |
        dose |
          2  |   .1805839   .0773571    -4.00   0.000     .0779917    .4181288
          3  |   .0520066   .0349232    -4.40   0.000     .0139465    .1939333
------------------------------------------------------------------------------

By using predict after stcox, we can obtain the following:

  • baseline hazard (and stratified baseline hazard if estimates are stratified)
  • cumulative baseline hazard
  • baseline survivor function (and stratified baseline survivor function if estimates are stratified)
  • martingale residuals
  • Cox–Snell residuals
  • deviance residuals
  • likelihood displacement values
  • LMAX measures of influence
  • log-frailties
  • efficient score residuals
  • DFBETA measures of influence
  • Schoenfeld residuals
  • scaled Schoenfeld residuals

The data used above have censored observations but no time-varying covariates and no left truncation. The failure event—death—occurs only once. Had the data included time-varying covariates, left truncation, or recurring failure events, however, nothing would have changed in terms of what we type, and importantly, all the same features are available postestimation regardless of the characteristics of the data.

We can also fit Cox (as well as parametric) models with random effects. Known as shared frailty models in the survival literature, these models are fitted by specifying option shared():

. webuse catheter, clear
(Kidney data, McGilchrist and Aisbett, Biometrics, 1991)

. stset time infect

     failure event:  infect != 0 & infect < .
obs. time interval:  (0, time]
 exit on or before:  failure

------------------------------------------------------------------------------
       76  total obs.
        0  exclusions
------------------------------------------------------------------------------
       76  obs. remaining, representing
       58  failures in single record/single failure data
     7424  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =       562

. stcox age female, shared(patient)

Fitting comparison Cox model:

Estimating frailty variance:

Iteration 0:   log profile likelihood = -182.06713  
Iteration 1:   log profile likelihood =  -181.9791  
Iteration 2:   log profile likelihood = -181.97453  
Iteration 3:   log profile likelihood = -181.97453  

Fitting final Cox model:

Iteration 0:   log likelihood = -199.05599
Iteration 1:   log likelihood = -183.72296
Iteration 2:   log likelihood = -181.99509
Iteration 3:   log likelihood = -181.97455
Iteration 4:   log likelihood = -181.97453
Refining estimates:
Iteration 0:   log likelihood = -181.97453

Cox regression --
         Breslow method for ties                Number of obs      =        76
         Gamma shared frailty                   Number of groups   =        38
Group variable: patient

No. of subjects =           76                  Obs per group: min =         2
No. of failures =           58                                 avg =         2
Time at risk    =         7424                                 max =         2

                                                Wald chi2(2)       =     11.66
Log likelihood  =   -181.97453                  Prob > chi2        =    0.0029

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   1.006202   .0120965     0.51   0.607     .9827701    1.030192
      female |   .2068678    .095708    -3.41   0.001     .0835376    .5122756
-------------+----------------------------------------------------------------
       theta |   .4754497   .2673108
------------------------------------------------------------------------------
Likelihood-ratio test of theta=0: chibar2(01) =     6.27 Prob>=chibar2 = 0.006

Note: standard errors of hazard ratios are conditional on theta.

Reference

Lin, D. Y. and L. J. Wei. 1989.
The robust inference of the Cox proportional hazards model. Journal of the American Statistical Association 84: 1074–1078.
Bookmark and Share 
Stata 12
Overview: Why use Stata?
Stata/MP
Capabilities
Overview
Sample session
User-written commands
New in Stata 12
Supported platforms
Which Stata?
Technical support
User comments
Like us on Facebook Follow us on Twitter Follow us on LinkedIn Google+ Watch us on YouTube
Follow us
© Copyright 1996–2013 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index   |   View mobile site