# Re: st: repeat: predict after -streg-

 From wgould@stata.com (William Gould, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: repeat: predict after -streg- Date Thu, 12 Jul 2007 09:25:52 -0500

```Jeph Herrin <junk@spandrel.net> writes,

> My data are time to readmission (failure) after
> hospitalization, with censoring on death:
>

>    obs. time interval:  (0, ftime]
>      exit on or before:  failure

>    -------------------------------------------------------------------
>        424787  total obs.
>             0  exclusions
>    -------------------------------------------------------------------
>        424787  obs. remaining, representing
>         95529  failures in single record/single failure data
>      1.07e+07  total analysis time at risk, at risk from t =         0
>                                  earliest observed entry t =         0
>                                       last observed exit t =        30
>
> My model is
>
>      . streg `mycovars' , d(lnormal)
>
> And my problem is:
>
>       . predict median, median time
>       . count if median<=30
>          30
>
> So, though the data have 95,000+ failures in less than 30 days, my
> model predicts only 30 such failures.
> [...]
> Should I expect this?

It's possible.  Over the period 0 to 30, you had only 95,529 failures in
424,787 subjects, which is 95,529/424,787 = 22.5%.  Thus, the median survival
time in your data is greater than 30.

I agree with Jeph that the model obviously did not fit the subjects observed
to fail well.  That could just indicate that `mycovars' does not explain the
variation in the data well (which is different from saying that the covariates
are not significiantly different from zero).  For instance, imagine rather
than fitting

. streg `mycovars', d(lnnormal)

. streg, d(lnnormal)

That would be an constant-only model.  Typing

. predict median, median time

after that would result in the same prediction for everyone, and that
prediction would most certainly be greater than 30.

So my advice is "accept the result" or "worry about specification" or
"find better covariates".  (Concerning "worry about specification", who
says age is adequate and don't need to include an effect for age>40.
Or include age^2.  Or use fractional polynomials?)

Before Jeph takes my advice, however, he will want to make sure that he
does not have a technical problem.  So far, I have merely said that
nothing Jeph typed looks like an error to me and given the output Jeph
has shared, the results shown are consistent with there being no data
processing errors.

First, Jeph should type

. stsum

and

. sts list

and verify that the output is consistent with what he knows about his
data.  After that, I would also type

. summarize _t if _d==0

and

. summarize _t if _d==1

Jeph should verify that the recorded analytic times for the censored and
the failed are consistent with what he knows about his data.

Finally, I would type

. summarize median if _d==0

and

. summarize median if _d==1

I would certainly want to see that the predicted median times for the
censored group are, on average, greater than for the failed group.
I am assumming here that most of the failed group failed before time 30
and that most of the censored were censored at time 30.

Like I said, I suspect that the only problem Jeph is lack of explanatory
power.  Among `mycovars', Jeph may have uncovered variable or variables
that have a significant effect, perhaps even an important effect, one
survival, but even so, there is still a lot left to explain.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```