The following question and answer is based on an exchange
that started on Statalist.
What is the relationship between baseline hazard and baseline hazard
contribution?
| Title |
|
Baseline hazard and baseline hazard contribution |
| Author |
William Gould, StataCorp |
| Date |
October 2001; updated July 2009
|
Question:
In Stata’s
stcox model,
I’ve noticed that it is now possible to obtain nonparametric estimates
of the contribution to the baseline hazard (through the
basehc() option in Stata 7 to 10 or through the
postestimation command predict, basehc in Stata 11), but it is no
longer possible to get nonparametric estimates of the baseline hazard itself
(which used to be available through the basehazard()
option in Stata 6). After reading Kalbfleisch and Prentice, I’m wondering if
there is some equivocation in the use of the word “baseline”
here. What is the relationship between baseline hazard and baseline hazard
contribution?
Answer:
Yes, indeed there is some equivocation.
First, what used to be returned by the old (Stata 6)
basehazard() option is exactly what was
returned by the
basehc() option in versions 7–10
and is created now by the postestimation command predict with the option basehc.
The problem was that what was returned by the old
basehazard() option was not (and what is
returned by the new basehc() option is not)
the baseline hazard; it is the numerator of the baseline hazard, called the
hazard contribution by Kalbfleisch and Prentice (2002, p. 115, eq.
3–34). To convert what is returned to a baseline hazard, you could
divide it by Delta_t, the time between failures. But don’t do that.
I did some simulations and quickly convinced myself that dividing by
Delta_t is a poor estimator of the baseline hazard. Results are
much better if the estimate is based on the cumulative hazard, using
smoothing followed by numerical differentiation techniques.
The command
stcurve calculates and plots the smoothed hazard estimate. By default,
stcurve plots the estimate at the means of the covariates:
. sysuse cancer, clear
(Patient Survival in Drug Trial)
. stset studytime, failure(died)
failure event: died != 0 & died < .
obs. time interval: (0, studytime]
exit on or before: failure
------------------------------------------------------------------------------
48 total obs.
0 exclusions
------------------------------------------------------------------------------
48 obs. remaining, representing
31 failures in single record/single failure data
744 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 39
. stcox drug age, nolog
failure _d: died
analysis time _t: studytime
Cox regression -- Breslow method for ties
No. of subjects = 48 Number of obs = 48
No. of failures = 31
Time at risk = 744
LR chi2(2) = 36.29
Log likelihood = -81.765061 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | .2153648 .0676904 -4.89 0.000 .1163154 .3987605
age | 1.116351 .0403379 3.05 0.002 1.040025 1.198279
------------------------------------------------------------------------------
. stcurve, hazard
The command stcurve is using kernel density
estimation to perform the smoothing we referred to above. We can do this by
hand using the baseline hazard contributions and the command
kdensity
to perform the smoothing:
. sysuse cancer
(Patient Survival in Drug Trial)
. stset studytime, failure(died)
failure event: died != 0 & died < .
obs. time interval: (0, studytime]
exit on or before: failure
------------------------------------------------------------------------------
48 total obs.
0 exclusions
------------------------------------------------------------------------------
48 obs. remaining, representing
31 failures in single record/single failure data
744 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 39
. stcox drug age, nolog
failure _d: died
analysis time _t: studytime
Cox regression -- Breslow method for ties
No. of subjects = 48 Number of obs = 48
No. of failures = 31
Time at risk = 744
LR chi2(2) = 36.29
Log likelihood = -81.765061 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | .2153648 .0676904 -4.89 0.000 .1163154 .3987605
age | 1.116351 .0403379 3.05 0.002 1.040025 1.198279
------------------------------------------------------------------------------
. predict hc0, basehc
(17 missing values generated)
. sum drug
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
drug | 48 1.875 .8410986 1 3
. replace drug=r(mean)
drug was int now float
(48 real changes made)
. sum age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 48 55.875 5.659205 47 67
. replace age=r(mean)
age was int now float
(48 real changes made)
. predict double xb, xb
. gen double hcmean = (1-(1-hc0)^exp(xb))
(17 missing values generated)
. drop if hc0==.
(17 observations deleted)
. sort _t
. by _t: keep if _n==1
(10 observations deleted)
. summ _t, meanonly
. local tmin = r(min)
. local tmax = r(max)
. local N = _N
. local N1 = `N' + 1
. local obs = `N'+101
. set obs `obs'
obs was 21, now 122
. gen t0 = `tmin' + (`tmax'-`tmin')*(_n-`N1')/100 ///
in `N1'/l
(21 missing values generated)
. gen t1 = t0 if t0>=4.62 & t0<=28.38
(48 missing values generated)
. kdensity _t [iweight=hcmean] if _d, at(t1) generate(hmean) nograph
. twoway line hmean t1, ytitle("") ///
xtitle("analysis time") ///
title("Smoothed hazard estimate")
We can see that stcurve is doing a lot of
work for us. First, it obtains the means of the covariates and
calculates the hazard contributions at the mean. Next, it creates 101 equally
spaced time points at which to calculate the smoothed hazard estimate.
Finally, it uses kdensity to do the smoothing.
Reference
- Kalbfleisch, J. D., and R. L. Prentice. 2002.
- The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley.
|
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
|