Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Baseline hazard in discrete time hazards model

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: Baseline hazard in discrete time hazards model
Date	Sun, 9 Mar 2014 21:21:30 -0400

Melaku:

I recommend that you also Stephen's book "Survival Analysis", linked to
on his website. See especially Section 3.3.1 "A discrete time
representation of a continuous time proportional hazards model". In the
book the continuous-time and discrete/interval hazards are indicated by
θ(t) and h(t), respectively.

Steve
[email protected]

On Mar 8, 2014, at 11:53 AM, [email protected] wrote:

Melaku Fekadu <[email protected]> asks about computation of the Baseline hazard in discrete time hazards model

Have a look at the Lessons on discrete time models at the website below for worked examples that derive discrete time (interval) hazard functions and survivor functions from a set of estimates.  You can also read about the difference between models with and without frailty. (Extensions to the code shown, based on -predictnl- could be used to put CIs around the estimates.)

In short, you can use -predict- after estimation of a discrete time proportional hazard with -cloglog- (or -xtloglog-, making additional assumptions about the frailty) in order to derive discrete/interval hazard functions and survivor functions.

The email by Maarten Buis that you cite raises some potentially interesting issues.

I think a distinction needs to be made between the discrete-time proportional hazards model and the underlying continuous time proportional hazard model. Coefficients on the covariates in the former (grouped data or interval-censored) model are exactly the same as the coefficients on the covariates in the latter model.  To see this, observe that, for subject i:

The continuous time PH model is of the form log(cts-hazard_it; X_it) = f(t) + b'X_it.
The interval-censored PH model is of the form cloglog(ic-hazard_ij; X_j) = g(j) + b'X_ij,
where t is a continuous-time survival time, and j is a number of intervals e.g. months or years.

Note that the baseline hazard of the continuous time PH model, f(t), cannot be identified from estimates of the discrete time model without additional assumptions. The baseline hazard of the discrete time hazard model, g(j), is not the same as f(t).    

If you wanted to assume that the baseline hazard of the underlying cts time PH model really was Weibull (say), then it is possible to derive estimates of the Weibull shape parameter and the "b" coefficients from interval-censored data.  But you have to code your own likelihood to do it. That is, the standard "easy estimation" method for fitting discrete time PH models (expand data so that one record per interval each subject is at risk of the event; and then using a -cloglog- model) doesn't work. If you don't have any covariates that vary with survival time, then some modules are available. See e.g. -intcens- on SSC.



Stephen
------------------
Stephen P. Jenkins <[email protected]> 
Survival Analysis Using Stata: http://www.iser.essex.ac.uk/survival-analysis

------------------------------

Date: Sat, 8 Mar 2014 02:19:01 +0200
From: Melaku Fekadu <[email protected]>
Subject: st: Baseline hazard in discrete time hazards model

Dear Statalisters,

My question is about the calculation of baseline hazard in discrete time hazards model. I want to mention that I have really looked through previous posts in this list and Stata help and manual before I ask here.

Am I wrong to assume that I can use -predict- to calculate baseline hazard after estimating a -cloglog- model? I ask this because in an earlier post (attached below) there was an answer to a similar question where it is said that one has to exponentiate the xb (linear combination) because -cloglog- models the log hazard.

I thought that -cloglog- models the hazard as follows

h = 1-exp(-exp(xb)).

Whereas the previous post suggested using

h = exp(xb) to calculate the baseline hazard.

I will be happy if anyone can elaborate on this. Is this fine to use -predict- which I assume gives h = 1-exp(-exp(xb)) for baseline hazard (assuming, of course, zero or mean values for other covariates)?

My other question is that -predict- gives very similar baseline hazard for cloglog and xtcloglog. I used -predict hxt, pu0- for xtcloglog (see syntax below). The estimated coefficients are also similar. I am afraid that I am missing something. I appreciate any elaboration on this too. The estimated baseline hazeds in the two models are:

.  Cloglog Xtcloglog
1 0.154997 0.154974
2 0.265045 0.264967
3 0.275298 0.275166
4 0.561485 0.561224
5 0.734756 0.734525
6 0.748665 0.748422
I used the following syntax to calculate the baseline hazards.

sysuse cancer, clear
gen id = _n

expand studytime
bysort id : gen month = _n

// the addition (month) makes sure that _n==_N means
// the last month for that individual
bysort id (month) : gen byte dead = died==1 & _n==_N
lab var dead "binary depvar for discrete hazard model"

// further trick to safe you some typing
gen halfyr = ceil(month/6)
ta halfyr ,ge(dur)
replace halfyr = 6 if halfyr == 7
replace dur6 = dur6 + dur7
drop dur7

// when looking at the baseline hazard you need to make
// sure it refers to a meaningful group by making sure
// that the value 0 for all your explanatory variables 
// refer to a meaningful value within the range of the
// data, here I centered age at 50 years.
gen c_age = age -- 50

cloglog dead drug c_age dur1 dur2 dur3 dur4 dur5 dur6, ///
       nocons nolog
preserve
replace drug = 0
replace c_age = 0
predict h
tab halfyr, summarize(h) means
restore

xtcloglog dead drug c_age dur1 dur2 dur3 dur4 dur5 dur6, nolog nocons i(id)
// for the baseline here I assume zero random effect

preserve
replace drug = 0
replace c_age = 0
predict hxt, pu0
tab halfyr, summarize(hxt) means
restore

Thanks a lot,
Melaku


/////////////////////////////////////
The previous post about similar question:

Re: st: Recovering the discrete time (interval) baseline hazard function
________________________________________

Follow-Ups:
- Re: st: Baseline hazard in discrete time hazards model
  - From: Melaku Fekadu <[email protected]>

References:
- st: Baseline hazard in discrete time hazards model
  - From: <[email protected]>

Prev by Date: Re: st: Posthoc power analysis for linear mixed effect model
Next by Date: Re: st: -label define- and -replace- when a variable may be missing
Previous by thread: Re: st: Baseline hazard in discrete time hazards model
Next by thread: Re: st: Baseline hazard in discrete time hazards model
Index(es):
- Date
- Thread