|
This FAQ is relevant for users of Stata 10 or earlier. It is not relevant for newer versions.
Why does stcox sometimes produce missing standard errors?
| Title |
|
Missing standard errors reported by stcox
|
| Author |
Mario Cleves, StataCorp |
| Date |
October 1999 |
There are two major reasons for missing standard errors in a Cox proportional
hazards regression. The first is failure to converge. Although this is rare,
if in the last step of the iteration log the message “nonconcave
function encountered” or “unproductive step attempted”
appear, then the estimation procedure did not converge to the MLE and the
results cannot be trusted.
Missing standard errors in a Cox proportional hazards regression, however, are
more often due to one of four types of collinearity:
1) Covariate is collinear with the dead/censor variable.
This results in a hazard ratio of infinity (large number printed out) and a
missing standard error if there is positive collinearity, or a hazard
ratio of zero (large negative coefficient) and a missing standard error if
there is negative collinearity.
. webuse cancer
(Patient Survival in Drug Trial)
. stset studytime, f(died)
failure event: died != 0 & died < .
obs. time interval: (0, studytime]
exit on or before: failure
------------------------------------------------------------------------------
48 total obs.
0 exclusions
------------------------------------------------------------------------------
48 obs. remaining, representing
31 failures in single record/single failure data
744 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 39
. generate copy=_d
. stcox age drug copy, exactp nolog
failure _d: died
analysis time _t: studytime
Cox regression -- exact partial likelihood
No. of subjects = 48 Number of obs = 48
No. of failures = 31
Time at risk = 744
LR chi2(3) = 59.38
Log likelihood = -62.481243 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 1.090842 .042937 2.21 0.027 1.009851 1.178328
drug | .2851362 .1067876 -3.35 0.001 .1368565 .5940725
copy | 5.28e+15 . . . . .
------------------------------------------------------------------------------
. generate negcopy=-_d
. stcox age drug negcopy, exactp nolog
failure _d: died
analysis time _t: studytime
Cox regression -- exact partial likelihood
No. of subjects = 48 Number of obs = 48
No. of failures = 31
Time at risk = 744
LR chi2(3) = 59.38
Log likelihood = -62.481243 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 1.090842 .042937 2.21 0.027 1.009851 1.178328
drug | .2851362 .1067876 -3.35 0.001 .1368565 .5940725
negcopy | 1.89e-16 . . . . .
------------------------------------------------------------------------------
2) Covariate is collinear with the time variable.
This results in a hazard ratio close to one (coefficient is zero) and a missing
standard error.
. clear
. set obs 1000
obs was 0, now 1000
. generate t=_n
. stset t
failure event: (assumed to fail at time=t)
obs. time interval: (0, t]
exit on or before: failure
------------------------------------------------------------------------------
1000 total obs.
0 exclusions
------------------------------------------------------------------------------
1000 obs. remaining, representing
1000 failures in single record/single failure data
500500 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 1000
. generate copy=_t
. stcox copy
failure _d: 1 (meaning all fail)
analysis time _t: t
Iteration 0: log likelihood = -5912.1282
Iteration 1: log likelihood = -4537.5754
Iteration 2: log likelihood = -3821.8484
Iteration 3: log likelihood = -3430.1547
Iteration 4: log likelihood = -3427.9073
Iteration 5: log likelihood = -3344.6335
Refining estimates:
Iteration 0: log likelihood = -3312.0701
Iteration 1: log likelihood = -2920.7381
Iteration 2: log likelihood = -2709.5843
Iteration 3: log likelihood = -2701.5327
Cox regression -- no ties
No. of subjects = 1000 Number of obs = 1000
No. of failures = 1000
Time at risk = 500500
LR chi2(1) = 6421.19
Log likelihood = -2701.5327 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
copy | .9343625 . . . . .
------------------------------------------------------------------------------
3) Covariate is collinear with the entry-time variable.
This results in a hazard ratio close to one (coefficient is zero) and a missing
standard error.
. clear
. set obs 1000
obs was 0, now 1000
. generate t0=_n-5
. generate t=_n
. stset t, enter(t0)
failure event: (assumed to fail at time=t)
obs. time interval: (0, t]
enter on or after: time t0
exit on or before: failure
------------------------------------------------------------------------------
1000 total obs.
0 exclusions
------------------------------------------------------------------------------
1000 obs. remaining, representing
1000 failures in single record/single failure data
4990 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 1000
. generate copy=_t0
. stcox copy
failure _d: 1 (meaning all fail)
analysis time _t: t
enter on or after: time t0
Iteration 0: log likelihood = -1606.1782
Iteration 1: log likelihood = -1545.3983
Iteration 2: log likelihood = -1540.1655
Refining estimates:
Iteration 0: log likelihood = -1541.2987
Iteration 1: log likelihood = -1484.0017
Iteration 2: log likelihood = -1473.3656
Iteration 3: log likelihood = -1470.0384
Iteration 4: log likelihood = -1469.6364
Iteration 5: log likelihood = -1463.425
Cox regression -- no ties
No. of subjects = 1000 Number of obs = 1000
No. of failures = 1000
Time at risk = 4990
LR chi2(1) = 285.51
Log likelihood = -1463.425 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
copy | .9293137 . . . . .
------------------------------------------------------------------------------
4) Covariate does not vary within death event risk sets.
This is a complicated form of collinearity wherein a covariate varies
overall, but for each death event, it does not vary within the
associated risk set.
This results in a hazard ratio of one (coefficient is zero) and a missing
standard error.
. clear
. input id t0 t dead x
id t0 t dead x
1. 1 0 1 1 6.18
2. 2 0.5 1 1 6.18
3. 3 1 6 1 5.55
4. 4 3 7 0 5.55
5. end
. stset t, failure(dead) enter(t0)
failure event: dead != 0 & dead < .
obs. time interval: (0, t]
enter on or after: time t0
exit on or before: failure
------------------------------------------------------------------------------
4 total obs.
0 exclusions
------------------------------------------------------------------------------
4 obs. remaining, representing
3 failures in single record/single failure data
10.5 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 7
. list
+-------------------------------------------------+
| id t0 t dead x _st _d _t _t0 |
|-------------------------------------------------|
1. | 1 0 1 1 6.18 1 1 1 0 |
2. | 2 .5 1 1 6.18 1 1 1 .5 |
3. | 3 1 6 1 5.55 1 1 6 1 |
4. | 4 3 7 0 5.55 1 0 7 3 |
+-------------------------------------------------+
. stcox x
failure _d: dead
analysis time _t: t
enter on or after: time t0
Iteration 0: log likelihood = -2.0794415
Refining estimates:
Iteration 0: log likelihood = -2.0794415
Cox regression -- Breslow method for ties
No. of subjects = 4 Number of obs = 4
No. of failures = 3
Time at risk = 10.5
LR chi2(1) = 0.00
Log likelihood = -2.0794415 Prob > chi2 = 1.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1 . . . . .
------------------------------------------------------------------------------
Coefficients for the variables that have (any form of) collinearity cannot
be estimated. Leaving them in or deleting them from the model results in
the same likelihood value and does not alter the results for the
noncollinear variables.
Although the first three forms of collinearity can be easily assessed, the
fourth requires that the appropriate risk sets be formed. This task is
facilitated by the use of the program st_rpool, written by Bill
Gould, that can be downloaded from Stata’s website.
To obtain st_rpool, type in Stata:
. net from http://www.stata.com
. net cd users/wgould
. net describe st_rpool
. net install st_rpool
or,
- from the Help menu select SJ and User-written Programs
- click on Other locations
- click on users
- click on wgould
- click on st_rpool
- and finally on click here to install
Let’s use st_rpool to look at the values of the covariate x in
the risk sets:
. clear
. input id t0 t dead x
id t0 t dead x
1. 1 0 1 1 6.18
2. 2 0.5 1 1 6.18
3. 3 1 6 1 5.55
4. 4 3 7 0 5.55
5. end
. stset t, failure(dead) enter(t0)
failure event: dead != 0 & dead < .
obs. time interval: (0, t]
enter on or after: time t0
exit on or before: failure
------------------------------------------------------------------------------
4 total obs.
0 exclusions
------------------------------------------------------------------------------
4 obs. remaining, representing
3 failures in single record/single failure data
10.5 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 7
. list
+-------------------------------------------------+
| id t0 t dead x _st _d _t _t0 |
|-------------------------------------------------|
1. | 1 0 1 1 6.18 1 1 1 0 |
2. | 2 .5 1 1 6.18 1 1 1 .5 |
3. | 3 1 6 1 5.55 1 1 6 1 |
4. | 4 3 7 0 5.55 1 0 7 3 |
+-------------------------------------------------+
. st_rpool set
. sort set id
. list, sepby(set)
+--------------------------------------+
| id t0 x _d _t _t0 set |
|--------------------------------------|
1. | 1 0 6.18 1 1 0 1 |
2. | 2 .5 6.18 1 1 .5 1 |
|--------------------------------------|
3. | 3 1 5.55 1 6 1 2 |
4. | 4 3 5.55 0 7 3 2 |
+--------------------------------------+
|