Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Validity of hazard predictions in frailtymodels


From   "Steinar Fossedal" <Steinar.Fossedal@nhh.no>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Validity of hazard predictions in frailtymodels
Date   Tue, 29 May 2007 14:52:30 +0200

Hi, listers

I have fitted an exponential survival model with gamma frailty, and now
find myself in a pickle trying to interpret and apply the predictive
results. Specifically I'm getting predictions of individual hazard that
exceeds one. Such estimates are, of course, problematic to use in the
next step of my analysis.

I reason that this is caused by the multiplicative effect of the frailty
parameter on the hazard, and that the model only ensures the validity of
the population hazard values - not the unobserved individual's. With
validity I mean restrictions that ensure the hazard stays between zero
and one. To me, this seems like an inherent weakness in frailty models.
This may not matter much when investigating hazard ratios and
differences between populations, but it does pose a problem when making
individual predictions. 

My question is then, is there any known way of handling this problem,
and can it be done in Stata?

My only alternative seems to be the rather week options of declaring the
problematic observations outliers based on the extreme values of their
covariates. However I would like to avoid this as they are not really
outliers, only observations in the tail of the distribution.

I hope you can help.

-Steinar

PS.
Needless to say, this problem should also carry over to the survival
function, but strangely enough I have found no odd predictions there -
all individual survival estimates are between zero and one. An example
follows below. All numbers represent individual predictions. csurv_a
represents Stata's prediction of cumulative survival using -predict
csurv_a, csurv alpha1-. The column csurv_a_est is my estimation of
cumulative survival, calculated by the following two lines:

bysort id (time): gen csurv_a_est=(1-haz_a)
bysort id (time): replace csurv_a_est=csurv_a_est[_n-1]*(1-haz_a)

As you can see, the results do not match Stata's in this case (although
they fit very well with haz_a<1).

time	haz_a		surv_a	csurv_a	csurv_a_est
1	1.225237	.2936882	.5071228	-.2252365
2	1.225237	.2936882	.3689136	.0507315
3	1.225237	.2936882	.2990291	-.0114266
4	1.225237	.2936882	.2555756	.0025737
5	1.225237	.2936882	.2254425	-.0005797
6	1.225237	.2936882	.2030775	.0001306
7	1.225237	.2936882	.1856877	-.0000294
8	1.225237	.2936882	.1717002	6.62e-06
9	1.133204	.3220001	.1602065	-8.82e-07
10	1.133204	.3220001	.1505179	1.18e-07
11	1.133204	.3220001	.1422167	-1.57e-08
12	1.133204	.3220001	.135008	2.09e-09
13	1.133204	.3220001	.128677	-2.78e-10
14	1.133204	.3220001	.1230631	3.70e-11
15	1.225237	.2936882	.1180301	-8.33e-12
..

Any comments would be appreciated.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index