Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Developing a Predictive Risk Equation from stcox survival analysis


From   Tom Robinson <tomrobnz@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Developing a Predictive Risk Equation from stcox survival analysis
Date   Thu, 20 Sep 2012 21:43:46 +1200

Thanks very much for the advice, it has all been very helpful



Tom

On 20 September 2012 02:36, Roger B. Newson <r.newson@imperial.ac.uk> wrote:
> I would second Steve's diagnosis. To find more about the issues with the use
> of Harrell's c with predictive scores after -stcox- (or even after -streg-),
> see Newson (2010).
>
> Best wishes
>
> Roger
>
> References
>
> Newson RB. Comparing the predictive power of survival models using Harrell’s
> c or Somers’ D. The Stata Journal 2010; 10(3): 339–358. Purchase from
> http://www.stata-journal.com/article.html?article=st0198
> or download pre-publication draft from
> http://www.imperial.ac.uk/nhli/r.newson/papers.htm#papers_in_journals
>
>
> Roger B Newson BSc MSc DPhil
> Lecturer in Medical Statistics
> Respiratory Epidemiology and Public Health Group
> National Heart and Lung Institute
> Imperial College London
> Royal Brompton Campus
> Room 33, Emmanuel Kaye Building
> 1B Manresa Road
> London SW3 6LR
> UNITED KINGDOM
> Tel: +44 (0)20 7352 8121 ext 3381
> Fax: +44 (0)20 7351 8322
> Email: r.newson@imperial.ac.uk
> Web page: http://www.imperial.ac.uk/nhli/r.newson/
> Departmental Web page:
> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
>
> Opinions expressed are those of the author, not of the institution.
>
>
> On 19/09/2012 15:03, Steve Samuels wrote:
>>
>> Tom Robinson:
>>
>>> My problem is that the when I run an estat concordance on my model I get
>>> a
>>> higher Harrel's C than I do when I run roctab on my outcome and the risks
>>> I have calculated (using the development dataset still).
>>
>>
>>
>> This is not surprising behavior: -roctab- is for binary outcomes; it
>> ignores censoring and time.
>>
>> Steve
>>
>>
>>
>> On Sep 18, 2012, at 9:13 PM, Phil Clayton wrote:
>>
>> After running your Cox model:
>> predict double xbeta, xb
>> predict double basesurv, basesurv
>>
>> Now you need to know the baseline survivor function at 5 years. The
>> mistake you've made is that this number is actually the same for everyone.
>> Try this:
>> line basesurv _t, sort
>>
>> You just need the point on the curve where _t==5. Since the baseline
>> survivor function only goes down with time, this point is the minimum
>> basesurv when time is less than 5 years:
>> sum basesurv if _t<5
>> scalar base5y=r(min)
>>
>> Finally you can calculate each patient's risk at 5 years by adjusting the
>> baseline risk:
>> gen risk5y=1 - base5y^exp(xbeta)
>>
>> You can of course avoid the use of the scalar, but the above code makes it
>> a little clearer what you're doing. Here's the abbreviated version:
>> sum basesurv if _t<5
>> gen risk5y=1 - r(min)^exp(xbeta)
>>
>> A word on precision - you're raising a small number (between 0 and 1) to
>> the exponent of another number. Therefore minor problems with precision
>> rapidly become very big ones. This is why I have suggested using double
>> precision for the xbeta and basesurv variables. It's also very useful to
>> centre your covariates, so that the baseline survivor function represents an
>> "average" patient rather than a patient with extremely or impossibly low
>> risk. See "Making baseline reasonable" in [ST] stcox postestimation.
>>
>> Phil
>>
>> On 19/09/2012, at 6:23 AM, Tom Robinson wrote:
>>
>>> Hi,
>>>
>>> I am using stcox to develop a predictive risk model but am unsure about
>>> how
>>> to formulate the final equation.  I am using Stata 12.1
>>>
>>> I have independent variables that were collected by family physicians as
>>> part of routine care e.g. blood pressure, lipids, renal function,
>>> demographic variables, time since developing diabetes . These come from a
>>> single review and I am using this review date as onset. The outcome is
>>> new
>>> onset of end-stage renal failure which is collected from a range of
>>> national datasets (in New Zealand).
>>>
>>> I have developed a model using stcox which I'm happy with but need to
>>> turn
>>> this into a risk prediction equation for risk at 5 years after which I
>>> can
>>> use in a validation dataset. I have centered all the variables around
>>> their
>>> mean.
>>>
>>> What I have done so far is: (following Tangri N, Stevens LA, Griffith J,
>>> et
>>> al. A predictive model for progression of chronic kidney disease to
>>> kidney
>>> failure. JAMA. 2011;305(15):1553-1559.appendix)
>>>
>>>   - use predict *newvar*, xb to calculate each individuals overall hazard
>>>   coefficient
>>>   - confirmed for myself that this is equivalent to the sum of each
>>>   variable multiplied by its coefficient from the model
>>>   - confirmed that a dummy individual X with all the independent
>>> variables
>>>   set at 0 (in other words at the means) has a overall hazard of 0
>>> (*newvar
>>>   *)
>>>   - used predict *newvar2*, basesurv to calculate the baseline survivals
>>>   - set individual X _t to 5 years which is the time period I'm
>>> interested
>>>   in predicting risk at.  This individuals baseline survival is Y
>>>   - Used this survival in the equation gen risk5yr=1-(Y)^exp(*newvar*) to
>>>   calculate each persons risk of the event at 5 years
>>>
>>> My problem is that the when I run an estat concordance on my model I get
>>> a
>>> higher Harrel's C than I do when I run roctab on my outcome and the risks
>>> I have calculated (using the development dataset still).
>>>
>>> I have also run a calibration analysis on my calculated risks which is
>>> wildly wrong (the predicted risks in each decile are about half of the
>>> actual risks)
>>>
>>> Clearly I'm doing something wrong but I can't see what.  Thanks for any
>>> advice
>>>
>>> --
>>> *Tom Robinson*
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Tom Robinson
ph. 445 2056
mob. 021 482 301

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index