Survival example
The input data for the survival-analysis commands are duration records:
each observation records a span of time over which the subject was observed,
along with an outcome at the end of the period. There can be one record per
subject or, if covariates vary over time, multiple records.
You can obtain simple descriptions:
. webuse cancer
(Patient Survival in Drug Trial)
. stset study died
failure event: died != 0 & died < .
obs. time interval: (0, studytime]
exit on or before: failure
------------------------------------------------------------------------------
48 total obs.
0 exclusions
------------------------------------------------------------------------------
48 obs. remaining, representing
31 failures in single record/single failure data
744 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 39
. rename drug dose
. label define dosage 1 "Control" 2 "5 mg" 3 "10 mg"
. label values dose dosage
. stdescribe
|-------------- per subject --------------|
Category total mean min median max
------------------------------------------------------------------------------
no. of subjects 48
no. of records 48 1 1 1 1
(first) entry time 0 0 0 0
(final) exit time 15.5 1 12.5 39
subjects with gap 0
time on gap if gap 0
time at risk 744 15.5 1 12.5 39
failures 31 .6458333 0 1 1
------------------------------------------------------------------------------
. stsum, by(dose)
| incidence no. of |------ Survival time -----|
dose | time at risk rate subjects 25% 50% 75%
---------+---------------------------------------------------------------------
Control | 180 .1055556 20 4 8 12
5 mg | 209 .0287081 14 13 22 23
10 mg | 355 .0169014 14 25 33 .
---------+---------------------------------------------------------------------
total | 744 .0416667 48 8 17 33
You can also compare the survivor functions
. sts list, by(dose) compare
-----Survivor Function------
dose Control 5 mg 10 mg
---------------------------------------------
time 1 0.9000 1.0000 1.0000
5 0.6000 1.0000 1.0000
9 0.4500 0.8512 0.9286
13 0.2250 0.7448 0.8571
17 0.1125 0.6207 0.8571
21 0.1125 0.6207 0.8571
25 . 0.2069 0.6857
29 . 0.2069 0.5878
33 . . 0.4408
37 . . 0.4408
41 . . .
---------------------------------------------
or you can review the complete life table:
. sts list, by(dose)
Beg. Net Survivor Std.
Time Total Fail Lost Function Error [95% Conf. Int.]
-------------------------------------------------------------------------------
Control
1 20 2 0 0.9000 0.0671 0.6560 0.9740
2 18 1 0 0.8500 0.0798 0.6038 0.9490
3 17 1 0 0.8000 0.0894 0.5511 0.9198
4 16 2 0 0.7000 0.1025 0.4505 0.8525
(output omitted)
23 1 1 0 0.0000 . . .
5 mg
6 14 1 1 0.9286 0.0688 0.5908 0.9896
7 12 1 0 0.8512 0.0973 0.5234 0.9607
9 11 0 1 0.8512 0.0973 0.5234 0.9607
(output omitted)
32 1 0 1 0.2069 0.1769 0.0104 0.5804
10 mg
6 14 1 0 0.9286 0.0688 0.5908 0.9896
10 13 1 0 0.8571 0.0935 0.5394 0.9622
17 12 0 1 0.8571 0.0935 0.5394 0.9622
(output omitted)
39 1 0 1 0.4408 0.1673 0.1312 0.7187
-------------------------------------------------------------------------------
Just as easily, you can obtain a graph
. sts graph, by(dose)
or test the equality of the survivor functions:
. sts test dose
failure _d: died
analysis time _t: studytime
Log-rank test for equality of survivor functions
| Events Events
dose | observed expected
--------+-------------------------
Control | 19 7.25
5 mg | 6 8.20
10 mg | 6 15.56
--------+-------------------------
Total | 31 31.00
chi2(2) = 30.19
Pr>chi2 = 0.0000
We used the log-rank test, but we could have specified the
Wilcoxon–Breslow–Gehan test, the Tarone–Ware test, the
Peto–Peto–Prentice test, or the Fleming–Harrington test.
We could also perform stratified versions of these tests to control for an
external covariate:
. generate agecat = 1
. replace agecat = 2 if age > 55
(25 real changes made)
. replace agecat = 3 if age > 60
(11 real changes made)
. tabulate agecat
agecat | Freq. Percent Cum.
------------+-----------------------------------
1 | 23 47.92 47.92
2 | 14 29.17 77.08
3 | 11 22.92 100.00
------------+-----------------------------------
Total | 48 100.00
. sts test dose, strata(agecat)
Stratified log-rank test for equality of survivor functions
| Events Events
dose | observed expected(*)
--------+-------------------------
Control | 19 7.37
5 mg | 6 9.67
10 mg | 6 13.95
--------+-------------------------
Total | 31 31.00
(*) sum over calculations within agecat
chi2(2) = 29.46
Pr>chi2 = 0.0000
Stata can fit Cox proportional hazards, exponential, Weibull, Gompertz,
lognormal, log-logistic, and gamma models. In the case of the Cox
proportional hazards model,
- simple and stratified estimates are available
- right censoring, left truncation (delayed entry), intermediary gaps are
allowed
- conventional and robust estimates of variance are
available (Lin and Wei 1989)
The same is true of the parametric models. For exponential and Weibull
models, estimates are available in either the accelerated-time or hazard
metric.
Here we will focus on the Cox proportional hazards model using a model
fitted on our dose–age data that we described above:
. stcox age i.dose
failure _d: died
analysis time _t: studytime
Iteration 0: log likelihood = -99.911448
Iteration 1: log likelihood = -82.331523
Iteration 2: log likelihood = -81.676487
Iteration 3: log likelihood = -81.652584
Iteration 4: log likelihood = -81.652567
Refining estimates:
Iteration 0: log likelihood = -81.652567
Cox regression -- Breslow method for ties
No. of subjects = 48 Number of obs = 48
No. of failures = 31
Time at risk = 744
LR chi2(3) = 36.52
Log likelihood = -81.652567 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 1.118334 .0409074 3.06 0.002 1.040963 1.201455
|
dose |
2 | .1805839 .0892742 -3.46 0.001 .0685292 .4758636
3 | .0520066 .034103 -4.51 0.000 .0143843 .1880305
------------------------------------------------------------------------------
By default, stcox uses Breslow’s method for handling ties and
presents results as hazard ratios—that is, exponentiated
coefficients—but we can see the underlying coefficients if we wish:
. stcox, nohr
Cox regression -- Breslow method for ties
No. of subjects = 48 Number of obs = 48
No. of failures = 31
Time at risk = 744
LR chi2(3) = 36.52
Log likelihood = -81.652567 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .11184 .0365789 3.06 0.002 .0401467 .1835333
|
dose |
2 | -1.71156 .4943639 -3.46 0.001 -2.680495 -.7426241
3 | -2.956384 .6557432 -4.51 0.000 -4.241617 -1.671151
------------------------------------------------------------------------------
stcox can also handle ties using Efron’s method, the exact
partial-likelihood method, or the exact marginal-likelihood method.
We can as easily fit the model with robust estimates of variance:
. stcox age i.dose, vce(robust)
failure _d: died
analysis time _t: studytime
Iteration 0: log pseudolikelihood = -99.911448
Iteration 1: log pseudolikelihood = -82.331523
Iteration 2: log pseudolikelihood = -81.676487
Iteration 3: log pseudolikelihood = -81.652584
Iteration 4: log pseudolikelihood = -81.652567
Refining estimates:
Iteration 0: log pseudolikelihood = -81.652567
Cox regression -- Breslow method for ties
No. of subjects = 48 Number of obs = 48
No. of failures = 31
Time at risk = 744
Wald chi2(3) = 32.39
Log pseudolikelihood = -81.652567 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 1.118334 .0327643 3.82 0.000 1.055926 1.18443
|
dose |
2 | .1805839 .0773571 -4.00 0.000 .0779917 .4181288
3 | .0520066 .0349232 -4.40 0.000 .0139465 .1939333
------------------------------------------------------------------------------
By using predict after stcox, we can obtain
the following:
- baseline hazard (and stratified baseline hazard if estimates are stratified)
- cumulative baseline hazard
- baseline survivor function (and stratified baseline survivor function if estimates are stratified)
- martingale residuals
- Cox–Snell residuals
- deviance residuals
- likelihood displacement values
- LMAX measures of influence
- log-frailties
- efficient score residuals
- DFBETA measures of influence
- Schoenfeld residuals
- scaled Schoenfeld residuals
The data used above have censored observations but no time-varying
covariates and no left truncation. The failure
event—death—occurs only once. Had the data included
time-varying covariates, left truncation, or recurring failure events,
however, nothing would have changed in terms of what we type, and
importantly, all the same features are available postestimation regardless
of the characteristics of the data.
We can also fit Cox (as well as parametric) models with random effects.
Known as shared frailty models in the survival literature, these
models are fitted by specifying option shared():
. webuse catheter, clear
(Kidney data, McGilchrist and Aisbett, Biometrics, 1991)
. stset time infect
failure event: infect != 0 & infect < .
obs. time interval: (0, time]
exit on or before: failure
------------------------------------------------------------------------------
76 total obs.
0 exclusions
------------------------------------------------------------------------------
76 obs. remaining, representing
58 failures in single record/single failure data
7424 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 562
. stcox age female, shared(patient)
Fitting comparison Cox model:
Estimating frailty variance:
Iteration 0: log profile likelihood = -182.06713
Iteration 1: log profile likelihood = -181.9791
Iteration 2: log profile likelihood = -181.97453
Iteration 3: log profile likelihood = -181.97453
Fitting final Cox model:
Iteration 0: log likelihood = -199.05599
Iteration 1: log likelihood = -183.72296
Iteration 2: log likelihood = -181.99509
Iteration 3: log likelihood = -181.97455
Iteration 4: log likelihood = -181.97453
Refining estimates:
Iteration 0: log likelihood = -181.97453
Cox regression --
Breslow method for ties Number of obs = 76
Gamma shared frailty Number of groups = 38
Group variable: patient
No. of subjects = 76 Obs per group: min = 2
No. of failures = 58 avg = 2
Time at risk = 7424 max = 2
Wald chi2(2) = 11.66
Log likelihood = -181.97453 Prob > chi2 = 0.0029
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 1.006202 .0120965 0.51 0.607 .9827701 1.030192
female | .2068678 .095708 -3.41 0.001 .0835376 .5122756
-------------+----------------------------------------------------------------
theta | .4754497 .2673108
------------------------------------------------------------------------------
Likelihood-ratio test of theta=0: chibar2(01) = 6.27 Prob>=chibar2 = 0.006
Note: standard errors of hazard ratios are conditional on theta.
Reference
- Lin, D. Y. and L. J. Wei. 1989.
- The robust inference of the Cox proportional hazards model.
Journal of the American Statistical Association
84: 1074–1078.
|