Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Developing a Predictive Risk Equation from stcox survival analysis

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: Developing a Predictive Risk Equation from stcox survival analysis Date Wed, 19 Sep 2012 10:03:26 -0400

```Tom Robinson:

> My problem is that the when I run an estat concordance on my model I get a
> higher Harrel's C than I do when I run roctab on my outcome and the risks
> I have calculated (using the development dataset still).

This is not surprising behavior: -roctab- is for binary outcomes; it ignores censoring and time.

Steve

On Sep 18, 2012, at 9:13 PM, Phil Clayton wrote:

predict double xbeta, xb
predict double basesurv, basesurv

Now you need to know the baseline survivor function at 5 years. The mistake you've made is that this number is actually the same for everyone. Try this:
line basesurv _t, sort

You just need the point on the curve where _t==5. Since the baseline survivor function only goes down with time, this point is the minimum basesurv when time is less than 5 years:
sum basesurv if _t<5
scalar base5y=r(min)

Finally you can calculate each patient's risk at 5 years by adjusting the baseline risk:
gen risk5y=1 - base5y^exp(xbeta)

You can of course avoid the use of the scalar, but the above code makes it a little clearer what you're doing. Here's the abbreviated version:
sum basesurv if _t<5
gen risk5y=1 - r(min)^exp(xbeta)

A word on precision - you're raising a small number (between 0 and 1) to the exponent of another number. Therefore minor problems with precision rapidly become very big ones. This is why I have suggested using double precision for the xbeta and basesurv variables. It's also very useful to centre your covariates, so that the baseline survivor function represents an "average" patient rather than a patient with extremely or impossibly low risk. See "Making baseline reasonable" in [ST] stcox postestimation.

Phil

On 19/09/2012, at 6:23 AM, Tom Robinson wrote:

> Hi,
>
> I am using stcox to develop a predictive risk model but am unsure about how
> to formulate the final equation.  I am using Stata 12.1
>
> I have independent variables that were collected by family physicians as
> part of routine care e.g. blood pressure, lipids, renal function,
> demographic variables, time since developing diabetes . These come from a
> single review and I am using this review date as onset. The outcome is new
> onset of end-stage renal failure which is collected from a range of
> national datasets (in New Zealand).
>
> I have developed a model using stcox which I'm happy with but need to turn
> this into a risk prediction equation for risk at 5 years after which I can
> use in a validation dataset. I have centered all the variables around their
> mean.
>
> What I have done so far is: (following Tangri N, Stevens LA, Griffith J, et
> al. A predictive model for progression of chronic kidney disease to kidney
> failure. JAMA. 2011;305(15):1553-1559.appendix)
>
>  - use predict *newvar*, xb to calculate each individuals overall hazard
>  coefficient
>  - confirmed for myself that this is equivalent to the sum of each
>  variable multiplied by its coefficient from the model
>  - confirmed that a dummy individual X with all the independent variables
>  set at 0 (in other words at the means) has a overall hazard of 0 (*newvar
>  *)
>  - used predict *newvar2*, basesurv to calculate the baseline survivals
>  - set individual X _t to 5 years which is the time period I'm interested
>  in predicting risk at.  This individuals baseline survival is Y
>  - Used this survival in the equation gen risk5yr=1-(Y)^exp(*newvar*) to
>  calculate each persons risk of the event at 5 years
>
> My problem is that the when I run an estat concordance on my model I get a
> higher Harrel's C than I do when I run roctab on my outcome and the risks
> I have calculated (using the development dataset still).
>
> I have also run a calibration analysis on my calculated risks which is
> wildly wrong (the predicted risks in each decile are about half of the
> actual risks)
>
> Clearly I'm doing something wrong but I can't see what.  Thanks for any
>
> --
> *Tom Robinson*
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```