Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Developing a Predictive Risk Equation from stcox survival analysis |

Date |
Wed, 19 Sep 2012 10:03:26 -0400 |

Tom Robinson: > My problem is that the when I run an estat concordance on my model I get a > higher Harrel's C than I do when I run roctab on my outcome and the risks > I have calculated (using the development dataset still). This is not surprising behavior: -roctab- is for binary outcomes; it ignores censoring and time. Steve On Sep 18, 2012, at 9:13 PM, Phil Clayton wrote: After running your Cox model: predict double xbeta, xb predict double basesurv, basesurv Now you need to know the baseline survivor function at 5 years. The mistake you've made is that this number is actually the same for everyone. Try this: line basesurv _t, sort You just need the point on the curve where _t==5. Since the baseline survivor function only goes down with time, this point is the minimum basesurv when time is less than 5 years: sum basesurv if _t<5 scalar base5y=r(min) Finally you can calculate each patient's risk at 5 years by adjusting the baseline risk: gen risk5y=1 - base5y^exp(xbeta) You can of course avoid the use of the scalar, but the above code makes it a little clearer what you're doing. Here's the abbreviated version: sum basesurv if _t<5 gen risk5y=1 - r(min)^exp(xbeta) A word on precision - you're raising a small number (between 0 and 1) to the exponent of another number. Therefore minor problems with precision rapidly become very big ones. This is why I have suggested using double precision for the xbeta and basesurv variables. It's also very useful to centre your covariates, so that the baseline survivor function represents an "average" patient rather than a patient with extremely or impossibly low risk. See "Making baseline reasonable" in [ST] stcox postestimation. Phil On 19/09/2012, at 6:23 AM, Tom Robinson wrote: > Hi, > > I am using stcox to develop a predictive risk model but am unsure about how > to formulate the final equation. I am using Stata 12.1 > > I have independent variables that were collected by family physicians as > part of routine care e.g. blood pressure, lipids, renal function, > demographic variables, time since developing diabetes . These come from a > single review and I am using this review date as onset. The outcome is new > onset of end-stage renal failure which is collected from a range of > national datasets (in New Zealand). > > I have developed a model using stcox which I'm happy with but need to turn > this into a risk prediction equation for risk at 5 years after which I can > use in a validation dataset. I have centered all the variables around their > mean. > > What I have done so far is: (following Tangri N, Stevens LA, Griffith J, et > al. A predictive model for progression of chronic kidney disease to kidney > failure. JAMA. 2011;305(15):1553-1559.appendix) > > - use predict *newvar*, xb to calculate each individuals overall hazard > coefficient > - confirmed for myself that this is equivalent to the sum of each > variable multiplied by its coefficient from the model > - confirmed that a dummy individual X with all the independent variables > set at 0 (in other words at the means) has a overall hazard of 0 (*newvar > *) > - used predict *newvar2*, basesurv to calculate the baseline survivals > - set individual X _t to 5 years which is the time period I'm interested > in predicting risk at. This individuals baseline survival is Y > - Used this survival in the equation gen risk5yr=1-(Y)^exp(*newvar*) to > calculate each persons risk of the event at 5 years > > My problem is that the when I run an estat concordance on my model I get a > higher Harrel's C than I do when I run roctab on my outcome and the risks > I have calculated (using the development dataset still). > > I have also run a calibration analysis on my calculated risks which is > wildly wrong (the predicted risks in each decile are about half of the > actual risks) > > Clearly I'm doing something wrong but I can't see what. Thanks for any > advice > > -- > *Tom Robinson* > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Developing a Predictive Risk Equation from stcox survival analysis***From:*"Roger B. Newson" <r.newson@imperial.ac.uk>

**References**:**st: Developing a Predictive Risk Equation from stcox survival analysis***From:*Tom Robinson <tomrobnz@gmail.com>

**Re: st: Developing a Predictive Risk Equation from stcox survival analysis***From:*Phil Clayton <philclayton@internode.on.net>

- Prev by Date:
**st: Which matrix to use for continuous variables when using the xtmixed command** - Next by Date:
**st: Create Variable containing Generations per Household** - Previous by thread:
**Re: st: Developing a Predictive Risk Equation from stcox survival analysis** - Next by thread:
**Re: st: Developing a Predictive Risk Equation from stcox survival analysis** - Index(es):