Re: st: repeat: predict after -streg-

 From Jeph Herrin To statalist@hsphsun2.harvard.edu Subject Re: st: repeat: predict after -streg- Date Thu, 12 Jul 2007 14:33:03 -0400

```Bill, Sam:

This is exactly the insight I was missing. Of course the
predicted median is high, my crude median is high. And
yes, the model is not great, even though all of my covars
are highly (P<0.0001) significant.

Any other thoughts on how to handle this? We are creating
hospital rankings based on 30-day readmission rates; the
problem is that hospitals which discharge patients too
soon may be losing them to mortality before they have a
chance to be readmitted, so I want to account for the
competing risk of death vs readmission. Unfortunately,
we can only follow them for 30 days, during which only

The answer seems to be, I can get estimated median time to
readmission for each hospital, adjusted for my covars
(perhaps refined), but these will not translate to 30day
readmission rates because my model does nothing to discriminate
between those readmitted before 30 days and those readmitted
after, it only explains incremental differences in time
to failure.

thanks,
Jeph

William Gould, StataCorp LP wrote:
```
Jeph Herrin <junk@spandrel.net> writes,

```My data are time to readmission (failure) after
hospitalization, with censoring on death:

-> stset ftime, failure(radm)
```

```         failure event:  radm != 0 & radm < .
obs. time interval:  (0, ftime]
exit on or before:  failure
```

```   -------------------------------------------------------------------
424787  total obs.
0  exclusions
-------------------------------------------------------------------
424787  obs. remaining, representing
95529  failures in single record/single failure data
1.07e+07  total analysis time at risk, at risk from t =         0
earliest observed entry t =         0
last observed exit t =        30

My model is

. streg `mycovars' , d(lnormal)

And my problem is:

. predict median, median time
. count if median<=30
30

So, though the data have 95,000+ failures in less than 30 days, my
model predicts only 30 such failures.
[...]
Should I expect this?
```
It's possible. Over the period 0 to 30, you had only 95,529 failures in
424,787 subjects, which is 95,529/424,787 = 22.5%. Thus, the median survival
time in your data is greater than 30.
I agree with Jeph that the model obviously did not fit the subjects observed to fail well. That could just indicate that `mycovars' does not explain the variation in the data well (which is different from saying that the covariates
are not significiantly different from zero). For instance, imagine rather
than fitting

. streg `mycovars', d(lnnormal)

. streg, d(lnnormal)

That would be an constant-only model. Typing
. predict median, median time
after that would result in the same prediction for everyone, and that prediction would most certainly be greater than 30.

So my advice is "accept the result" or "worry about specification" or "find better covariates". (Concerning "worry about specification", who says age is adequate and don't need to include an effect for age>40. Or include age^2. Or use fractional polynomials?)

Before Jeph takes my advice, however, he will want to make sure that he does not have a technical problem. So far, I have merely said that nothing Jeph typed looks like an error to me and given the output Jeph has shared, the results shown are consistent with there being no data processing errors.

First, Jeph should type
. stsum
and
. sts list

and verify that the output is consistent with what he knows about his data. After that, I would also type
. summarize _t if _d==0

and
. summarize _t if _d==1
Jeph should verify that the recorded analytic times for the censored and the failed are consistent with what he knows about his data.

Finally, I would type
. summarize median if _d==0

and

. summarize median if _d==1

I would certainly want to see that the predicted median times for the censored group are, on average, greater than for the failed group.
I am assumming here that most of the failed group failed before time 30
and that most of the censored were censored at time 30.

Like I said, I suspect that the only problem Jeph is lack of explanatory power. Among `mycovars', Jeph may have uncovered variable or variables that have a significant effect, perhaps even an important effect, one survival, but even so, there is still a lot left to explain.

-- Bill
wgould@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

```*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```