Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: stcox and xi in Stata 12.1

 From Benno Kreuels To statalist@hsphsun2.harvard.edu Subject st: stcox and xi in Stata 12.1 Date Fri, 22 Feb 2013 17:16:41 +0000

```Dear statalist,

I have encountered a potential problem with the -stcox- command in Stata 12.1.
I am trying to fit a cox regression for a dataset with single-failure. The explanatory variable has a total of three categories. The problem can be replicated using one of the example datasets provided online (webuse leukemia) and is as follows:

I stet the data by typing
. stset weeks, failure(relapse)

This gives me the output:

failure event:  relapse != 0 & relapse < .
obs. time interval:  (0, weeks]
exit on or before:  failure

------------------------------------------------------------------------------
42  total obs.
0  exclusions
------------------------------------------------------------------------------
42  obs. remaining, representing
30  failures in single record/single failure data
541  total analysis time at risk, at risk from t =         0
earliest observed entry t =         0
last observed exit t =        35

I then fit a cox-model using i.wbc3cat as an explanatory variable and obtain the following output:

. xi:stcox i.wbc3cat
i.wbc3cat         _Iwbc3cat_1-3       (naturally coded; _Iwbc3cat_1 omitted)

failure _d:  relapse
analysis time _t:  weeks

Iteration 0:   log likelihood =  -93.98505
Iteration 1:   log likelihood =  -82.79096
Iteration 2:   log likelihood = -82.109332
Iteration 3:   log likelihood = -82.100544
Iteration 4:   log likelihood = -82.100543
Refining estimates:
Iteration 0:   log likelihood = -82.100543

Cox regression -- Breslow method for ties

No. of subjects =           42                     Number of obs   =        42
No. of failures =           30
Time at risk    =          541
LR chi2(2)      =     23.77
Log likelihood  =   -82.100543                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
_t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iwbc3cat_2 |   3.499543   2.090597     2.10   0.036     1.085202    11.28527
_Iwbc3cat_3 |   14.20711   8.940021     4.22   0.000     4.138811    48.76813
------------------------------------------------------------------------------

However, if I go on and restrict the model to only category 1 and 2 of wbc3cat I get a different estimate of the HR, P-value and 95% CI:

. xi:stcox i.wbc3cat if wbc3cat<3
i.wbc3cat         _Iwbc3cat_1-3       (naturally coded; _Iwbc3cat_1 omitted)

failure _d:  relapse
analysis time _t:  weeks

note: _Iwbc3cat_3 omitted because of collinearity
Iteration 0:   log likelihood = -37.480485
Iteration 1:   log likelihood = -35.003619
Iteration 2:   log likelihood =  -35.00193
Iteration 3:   log likelihood =  -35.00193
Refining estimates:
Iteration 0:   log likelihood =  -35.00193

Cox regression -- Breslow method for ties

No. of subjects =           25                     Number of obs   =        25
No. of failures =           14
Time at risk    =          431
LR chi2(1)      =      4.96
Log likelihood  =    -35.00193                     Prob > chi2     =    0.0260

------------------------------------------------------------------------------
_t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iwbc3cat_2 |   3.515159   2.105749     2.10   0.036     1.086512    11.37249
_Iwbc3cat_3 |          1  (omitted)
------------------------------------------------------------------------------

The difference in this dataset is not very large. However in the data I am using the HR changes from 2.09 to 1.89 and also the p-value and the confidence interval change considerably. I do not understand why this happens. Using -strate- to calculate the rates  gives me exactly the same results for the following commands (for the category included in both):

. strate wbc3cat, per(365.25)
and
. strate wbc3cat if wbc3cat<3, per(365.25)

and there is hardly any difference between  the results if I use a poisson regression by typing:

streg i.wbc3cat, dist(exp)
or
streg i.wbc3cat if wbc3cat<3, dist(exp)

I have tried finding a solution in the archives and in the stata manual. I am afraid that I might have some misconception about the way a cox-regression model is fitted as I am not a statistician. If that is the case, I would be grateful if someone could tell me where to find a good (and simple) explanation on how this help me with this problem.