Home  /  Products  /  Features  /  Survival example

The input data for the survival-analysis features are duration records: each observation records a span of time over which the subject was observed, along with an outcome at the end of the period. There can be one record per subject or, if covariates vary over time, multiple records.

You can obtain simple descriptions:

. webuse cancer
(Patient survival in drug trial)

. stset study died

Survival-time data settings

         Failure event: died!=0 & died<.
Observed time interval: (0, studytime]
     Exit on or before: failure

48 total observations
0 exclusions
48 observations remaining, representing
31 failures in single-record/single-failure data
744 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 39
. rename drug dose . label define dosage 1 "Control" 2 "5 mg" 3 "10 mg" . label values dose dosage . stdescribe Failure _d: died Analysis time _t: studytime
Per subject
Category Total Mean Min Median Max
Number of subjects 48
Number of records 48 1 1 1 1
 
Entry time (first) 0 0 0 0
Exit time (final) 15.5 1 12.5 39
 
Subjects with gap 0
Time on gap 0
Time at risk 744 15.5 1 12.5 39
 
Failures 31 .6458333 0 1 1
. stsum, by(dose) Failure _d: died Analysis time _t: studytime
Incidence Number of Survival time
dose Time at risk rate subjects 25% 50% 75%
Control 180 .1055556 20 4 8 12
5 mg 209 .0287081 14 13 22 23
10 mg 355 .0169014 14 25 33 .
Total 744 .0416667 48 8 17 33

You can also compare the survivor functions

. sts list, by(dose) compare

        Failure _d: died
  Analysis time _t: studytime

Kaplan–Meier survivor function
By variable: dose

dose Control 5 mg 10 mg
Time 1 0.9000 1.0000 1.0000
5 0.6000 1.0000 1.0000
9 0.4500 0.8512 0.9286
13 0.2250 0.7448 0.8571
17 0.1125 0.6207 0.8571
21 0.1125 0.6207 0.8571
25 . 0.2069 0.6857
29 . 0.2069 0.5878
33 . . 0.4408
37 . . 0.4408
41 . . .

or you can review the complete life table:

. sts list, by(dose)

        Failure _d: died
  Analysis time _t: studytime

Kaplan–Meier survivor function
By variable: dose

At Net Survivor Std.
Time risk Fail lost function error [95% conf. int.]
Control
1 20 2 0 0.9000 0.0671 0.6560 0.9740
2 18 1 0 0.8500 0.0798 0.6038 0.9490
3 17 1 0 0.8000 0.0894 0.5511 0.9198
4 16 2 0 0.7000 0.1025 0.4505 0.8525
(output omitted)
23 1 1 0 0.0000 . . .
5 mg
6 14 1 1 0.9286 0.0688 0.5908 0.9896
7 12 1 0 0.8512 0.0973 0.5234 0.9607
9 11 0 1 0.8512 0.0973 0.5234 0.9607
10 10 0 1 0.8512 0.0973 0.5234 0.9607
(output omitted)
32 1 0 1 0.2069 0.1769 0.0104 0.5804
10 mg
6 14 1 0 0.9286 0.0688 0.5908 0.9896
10 13 1 0 0.8571 0.0935 0.5394 0.9622
17 12 0 1 0.8571 0.0935 0.5394 0.9622
19 11 0 1 0.8571 0.0935 0.5394 0.9622
(output omitted)
39 1 0 1 0.4408 0.1673 0.1312 0.7187
Note: Net lost equals the number lost minus the number who entered.

Just as easily, you can obtain a graph

. sts graph, by(dose)

or test the equality of the survivor functions:

. sts test dose

        Failure _d: died
  Analysis time _t: studytime

Equality of survivor functions
Log-rank test

Observed Expected
dose events events
Control 19 7.25
5 mg 6 8.20
10 mg 6 15.56
Total 31 31.00
chi2(2) = 30.19 Pr>chi2 = 0.0000

We used the log-rank test, but we could have specified the Wilcoxon–Breslow–Gehan test, the Tarone–Ware test, the Peto–Peto–Prentice test, or the Fleming–Harrington test.

We could also perform stratified versions of these tests to control for an external covariate:

. generate agecat = 1

. replace agecat = 2 if age > 55
(25 real changes made)

. replace agecat = 3 if age > 60
(11 real changes made)

. tabulate agecat

agecat Freq. Percent Cum.
1 23 47.92 47.92
2 14 29.17 77.08
3 11 22.92 100.00
Total 48 100.00
. sts test dose, strata(agecat) Failure _d: died Analysis time _t: studytime Equality of survivor functions Stratified log-rank test Strata: agecat
Observed Expected
dose events events*
Control 19 7.37
5 mg 6 9.67
10 mg 6 13.95
Total 31 31.00
* aggregated over stratum-specific results chi2(2) = 29.46 Pr>chi2 = 0.0000

Stata can fit Cox proportional hazards, exponential, Weibull, Gompertz, lognormal, log-logistic, and gamma models. In the case of the Cox proportional hazards model,

  • simple and stratified estimates are available

  • right censoring, left truncation (delayed entry), intermediary gaps are allowed

  • conventional and robust estimates of variance are available (Lin and Wei 1989)

The same is true of the parametric models. For exponential and Weibull models, estimates are available in either the accelerated-time or hazard metric.

Here we will focus on the Cox proportional hazards model using a model fitted on our dose–age data that we described above:

. stcox age i.dose

        Failure _d: died
  Analysis time _t: studytime

Iteration 0:   log likelihood = -99.911448
Iteration 1:   log likelihood = -82.331523
Iteration 2:   log likelihood = -81.676487
Iteration 3:   log likelihood = -81.652584
Iteration 4:   log likelihood = -81.652567
Refining estimates:
Iteration 0:   log likelihood = -81.652567

Cox regression with Breslow method for ties

No. of subjects =  48                                   Number of obs =     48
No. of failures =  31
Time at risk    = 744
                                                        LR chi2(3)    =  36.52
Log likelihood = -81.652567                             Prob > chi2   = 0.0000

_t Haz. ratio Std. err. z P>|z| [95% conf. interval]
age 1.118334 .0409074 3.06 0.002 1.040963 1.201455
dose
5 mg .1805839 .0892742 -3.46 0.001 .0685292 .4758636
10 mg .0520066 .034103 -4.51 0.000 .0143843 .1880305

By default, stcox uses Breslow’s method for handling ties and presents results as hazard ratios—that is, exponentiated coefficients—but we can see the underlying coefficients if we wish:

. stcox, nohr

Cox regression with Breslow method for ties

No. of subjects =  48                                   Number of obs =     48
No. of failures =  31
Time at risk    = 744
                                                        LR chi2(3)    =  36.52
Log likelihood = -81.652567                             Prob > chi2   = 0.0000

_t Coefficient Std. err. z P>|z| [95% conf. interval]
age .11184 .0365789 3.06 0.002 .0401467 .1835333
dose
5 mg -1.71156 .4943639 -3.46 0.001 -2.680495 -.7426241
10 mg -2.956384 .6557432 -4.51 0.000 -4.241617 -1.671151

stcox can also handle ties using Efron’s method, the exact partial-likelihood method, or the exact marginal-likelihood method.

We can as easily fit the model with robust estimates of variance:

. stcox age i.dose, vce(robust)

        Failure _d: died
  Analysis time _t: studytime

Iteration 0:   log pseudolikelihood = -99.911448
Iteration 1:   log pseudolikelihood = -82.331523
Iteration 2:   log pseudolikelihood = -81.676487
Iteration 3:   log pseudolikelihood = -81.652584
Iteration 4:   log pseudolikelihood = -81.652567
Refining estimates:
Iteration 0:   log pseudolikelihood = -81.652567

Cox regression with Breslow method for ties

No. of subjects =  48                                   Number of obs =     48
No. of failures =  31
Time at risk    = 744
                                                        Wald chi2(3)  =  32.39
Log pseudolikelihood = -81.652567                       Prob > chi2   = 0.0000

Robust
_t Haz. ratio std. err. z P>|z| [95% conf. interval]
age 1.118334 .0327643 3.82 0.000 1.055926 1.18443
dose
5 mg .1805839 .0773571 -4.00 0.000 .0779917 .4181288
10 mg .0520066 .0349232 -4.40 0.000 .0139465 .1939333

By using predict after stcox, we can obtain the following:

  • baseline hazard (and stratified baseline hazard if estimates are stratified)

  • cumulative baseline hazard

  • baseline survivor function (and stratified baseline survivor function if estimates are stratified)

  • martingale residuals

  • Cox–Snell residuals

  • deviance residuals

  • likelihood displacement values

  • LMAX measures of influence

  • log-frailties

  • efficient score residuals

  • DFBETA measures of influence

  • Schoenfeld residuals

  • scaled Schoenfeld residuals

The data used above have censored observations but no time-varying covariates and no left truncation. The failure event—death—occurs only once. Had the data included time-varying covariates, left truncation, or recurring failure events, however, nothing would have changed in terms of what we type, and importantly, all the same features are available postestimation regardless of the characteristics of the data.

We can also fit Cox (as well as parametric) models with random effects. Known as shared frailty models in the survival literature, these models are fitted by specifying option shared():

. webuse catheter, clear
(Kidney data, McGilchrist and Aisbett, Biometrics, 1991)

. stset time infect

Survival-time data settings

         Failure event: infect!=0 & infect<.
Observed time interval: (0, time]
     Exit on or before: failure

76 total observations
0 exclusions
76 observations remaining, representing
58 failures in single-record/single-failure data
7,424 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 562
. stcox age female, shared(patient) Failure _d: infect Analysis time _t: time Fitting comparison Cox model ... Estimating frailty variance: Iteration 0: log profile likelihood = -182.06713 Iteration 1: log profile likelihood = -181.9791 Iteration 2: log profile likelihood = -181.97453 Iteration 3: log profile likelihood = -181.97453 Fitting final Cox model: Iteration 0: log likelihood = -199.05599 Iteration 1: log likelihood = -183.72296 Iteration 2: log likelihood = -181.99509 Iteration 3: log likelihood = -181.97455 Iteration 4: log likelihood = -181.97453 Refining estimates: Iteration 0: log likelihood = -181.97453 Cox regression with Breslow method for ties Gamma shared frailty Number of obs = 76 Group variable: patient Number of groups = 38 Obs per group: No. of subjects = 76 min = 2 No. of failures = 58 avg = 2 Time at risk = 7,424 max = 2 Wald chi2(2) = 11.66 Log likelihood = -181.97453 Prob > chi2 = 0.0029
_t Haz. ratio Std. err. z P>|z| [95% conf. interval]
age 1.006202 .0120965 0.51 0.607 .9827701 1.030192
female .2068678 .095708 -3.41 0.001 .0835376 .5122756
theta .4754497 .2673108
LR test of theta=0: chibar2(01) = 6.27 Prob >= chibar2 = 0.006 Note: Standard errors of hazard ratios are conditional on theta.

Reference

Lin, D. Y. and L. J. Wei. 1989. The robust inference of the Cox proportional hazards model. Journal of the American Statistical Association 84: 1074–1078.