Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Roger B. Newson" <r.newson@imperial.ac.uk> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Developing a Predictive Risk Equation from stcox survival analysis |
Date | Wed, 19 Sep 2012 15:36:59 +0100 |
Best wishes Roger ReferencesNewson RB. Comparing the predictive power of survival models using Harrell’s c or Somers’ D. The Stata Journal 2010; 10(3): 339–358. Purchase from
http://www.stata-journal.com/article.html?article=st0198 or download pre-publication draft from http://www.imperial.ac.uk/nhli/r.newson/papers.htm#papers_in_journals Roger B Newson BSc MSc DPhil Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: r.newson@imperial.ac.uk Web page: http://www.imperial.ac.uk/nhli/r.newson/ Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ Opinions expressed are those of the author, not of the institution. On 19/09/2012 15:03, Steve Samuels wrote:
Tom Robinson:My problem is that the when I run an estat concordance on my model I get a higher Harrel's C than I do when I run roctab on my outcome and the risks I have calculated (using the development dataset still).This is not surprising behavior: -roctab- is for binary outcomes; it ignores censoring and time. Steve On Sep 18, 2012, at 9:13 PM, Phil Clayton wrote: After running your Cox model: predict double xbeta, xb predict double basesurv, basesurv Now you need to know the baseline survivor function at 5 years. The mistake you've made is that this number is actually the same for everyone. Try this: line basesurv _t, sort You just need the point on the curve where _t==5. Since the baseline survivor function only goes down with time, this point is the minimum basesurv when time is less than 5 years: sum basesurv if _t<5 scalar base5y=r(min) Finally you can calculate each patient's risk at 5 years by adjusting the baseline risk: gen risk5y=1 - base5y^exp(xbeta) You can of course avoid the use of the scalar, but the above code makes it a little clearer what you're doing. Here's the abbreviated version: sum basesurv if _t<5 gen risk5y=1 - r(min)^exp(xbeta) A word on precision - you're raising a small number (between 0 and 1) to the exponent of another number. Therefore minor problems with precision rapidly become very big ones. This is why I have suggested using double precision for the xbeta and basesurv variables. It's also very useful to centre your covariates, so that the baseline survivor function represents an "average" patient rather than a patient with extremely or impossibly low risk. See "Making baseline reasonable" in [ST] stcox postestimation. Phil On 19/09/2012, at 6:23 AM, Tom Robinson wrote:Hi, I am using stcox to develop a predictive risk model but am unsure about how to formulate the final equation. I am using Stata 12.1 I have independent variables that were collected by family physicians as part of routine care e.g. blood pressure, lipids, renal function, demographic variables, time since developing diabetes . These come from a single review and I am using this review date as onset. The outcome is new onset of end-stage renal failure which is collected from a range of national datasets (in New Zealand). I have developed a model using stcox which I'm happy with but need to turn this into a risk prediction equation for risk at 5 years after which I can use in a validation dataset. I have centered all the variables around their mean. What I have done so far is: (following Tangri N, Stevens LA, Griffith J, et al. A predictive model for progression of chronic kidney disease to kidney failure. JAMA. 2011;305(15):1553-1559.appendix) - use predict *newvar*, xb to calculate each individuals overall hazard coefficient - confirmed for myself that this is equivalent to the sum of each variable multiplied by its coefficient from the model - confirmed that a dummy individual X with all the independent variables set at 0 (in other words at the means) has a overall hazard of 0 (*newvar *) - used predict *newvar2*, basesurv to calculate the baseline survivals - set individual X _t to 5 years which is the time period I'm interested in predicting risk at. This individuals baseline survival is Y - Used this survival in the equation gen risk5yr=1-(Y)^exp(*newvar*) to calculate each persons risk of the event at 5 years My problem is that the when I run an estat concordance on my model I get a higher Harrel's C than I do when I run roctab on my outcome and the risks I have calculated (using the development dataset still). I have also run a calibration analysis on my calculated risks which is wildly wrong (the predicted risks in each decile are about half of the actual risks) Clearly I'm doing something wrong but I can't see what. Thanks for any advice -- *Tom Robinson* * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/