[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: repeat: predict after -streg-

From (William Gould, StataCorp LP)
Subject   Re: st: repeat: predict after -streg-
Date   Thu, 12 Jul 2007 09:25:52 -0500

Jeph Herrin <> writes, 

> My data are time to readmission (failure) after
> hospitalization, with censoring on death:
>    -> stset ftime, failure(radm)

>          failure event:  radm != 0 & radm < .
>    obs. time interval:  (0, ftime]
>      exit on or before:  failure

>    -------------------------------------------------------------------
>        424787  total obs.
>             0  exclusions
>    -------------------------------------------------------------------
>        424787  obs. remaining, representing
>         95529  failures in single record/single failure data
>      1.07e+07  total analysis time at risk, at risk from t =         0
>                                  earliest observed entry t =         0
>                                       last observed exit t =        30
> My model is
>      . streg `mycovars' , d(lnormal)
> And my problem is:
>       . predict median, median time
>       . count if median<=30
>          30
> So, though the data have 95,000+ failures in less than 30 days, my
> model predicts only 30 such failures.
> [...]
> Should I expect this?

It's possible.  Over the period 0 to 30, you had only 95,529 failures in
424,787 subjects, which is 95,529/424,787 = 22.5%.  Thus, the median survival
time in your data is greater than 30.  

I agree with Jeph that the model obviously did not fit the subjects observed 
to fail well.  That could just indicate that `mycovars' does not explain the 
variation in the data well (which is different from saying that the covariates
are not significiantly different from zero).  For instance, imagine rather
than fitting

        . streg `mycovars', d(lnnormal)

Jeph had fitted 

        . streg, d(lnnormal)

That would be an constant-only model.  Typing 

        . predict median, median time 

after that would result in the same prediction for everyone, and that 
prediction would most certainly be greater than 30.

So my advice is "accept the result" or "worry about specification" or 
"find better covariates".  (Concerning "worry about specification", who 
says age is adequate and don't need to include an effect for age>40.  
Or include age^2.  Or use fractional polynomials?)

Before Jeph takes my advice, however, he will want to make sure that he 
does not have a technical problem.  So far, I have merely said that 
nothing Jeph typed looks like an error to me and given the output Jeph 
has shared, the results shown are consistent with there being no data 
processing errors.

First, Jeph should type 

        . stsum 


        . sts list

and verify that the output is consistent with what he knows about his 
data.  After that, I would also type 

        . summarize _t if _d==0


        . summarize _t if _d==1 

Jeph should verify that the recorded analytic times for the censored and 
the failed are consistent with what he knows about his data.

Finally, I would type 

        . summarize median if _d==0


        . summarize median if _d==1

I would certainly want to see that the predicted median times for the 
censored group are, on average, greater than for the failed group.
I am assumming here that most of the failed group failed before time 30
and that most of the censored were censored at time 30.

Like I said, I suspect that the only problem Jeph is lack of explanatory 
power.  Among `mycovars', Jeph may have uncovered variable or variables 
that have a significant effect, perhaps even an important effect, one 
survival, but even so, there is still a lot left to explain.

-- Bill
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index