Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: ROC-curves

From	"Seed, Paul" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: ROC-curves
Date	Tue, 22 Oct 2013 09:20:25 +0000

Roger is absolutely right.
Where data permits, using separate training and 
validation sets  will remove the bias associated with overfitting.

Where there are too few cases to permit this, the validation may have to 
take place in another data set, and will constitute a fresh study.

Otherwise,  it may be necessary to validate in a completely fresh cohort, gathered 
under different circumstances, to show how widely the test can be applied.


BW

Paul 

Paul T Seed, Senior Lecturer in Medical Statistics, 
Division of Women's Health, King's College London
Women's Health Academic Centre, King's Health Partners 
(+44) (0) 20 7188 3642.


------------------------------
> 
> Date: Mon, 21 Oct 2013 22:08:40 +0100
> From: "Roger B. Newson" <[email protected]>
> Subject: Re: st: ROC-curves
> 
> The main problem with confidence intervals for the area under a ROC
> generated from a logistic regression is that, if you estimate your ROC
> from the same data in which you fitted your logistic regression model,
> then you will probably be over-optimistic, as the parameters have been
> chosen to fit specifically that set of data. If you want your ROC area
> to have confidence limits which you can really be confident about, then
> it is a good idea to randomize your data into a training set and a test
> set, and to fit your logistic model to the training set, and to estimate
> its ROC area using out-of-sample prediction in the test set.
> 
> Newson (2010) discusses these issues with Cox regression and other
> survival models. As stated in the first paragraph of Section 5 of this
> reference, the procedure with non-survival models (like logistic
> regression) is similar, but similar.
> 
> I hope this helps.
> 
> Best wishes
> 
> Roger
> 
> References
> 
> Newson RB. Comparing the predictive power of survival models using
> Harrell's c or Somers' D. The Stata Journal 2010; 10(3): 339-358.
> Download from
> http://www.stata-journal.com/article.html?article=st0198
> 
> Roger B Newson BSc MSc DPhil
> Lecturer in Medical Statistics
> Respiratory Epidemiology and Public Health Group
> National Heart and Lung Institute
> Imperial College London
> Royal Brompton Campus
> Room 33, Emmanuel Kaye Building
> 1B Manresa Road
> London SW3 6LR
> UNITED KINGDOM
> Tel: +44 (0)20 7352 8121 ext 3381
> Fax: +44 (0)20 7351 8322
> Email: [email protected]
> Web page: http://www.imperial.ac.uk/nhli/r.newson/
> Departmental Web page:
> http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgene
> tics/reph/
> 
> Opinions expressed are those of the author, not of the institution.
> 
> On 18/10/2013 09:23, Seed, Paul wrote:
> > On 14/10/2013 18:54, Ragnhild Bergene Skråstad wrote:
> >   > Hi!
> >   > I investigate how different tests, in combination, can predict a given
> > outcome.
> >   >
> >   > I have made a logistic model with the command "logistic" and plotted
> > the ROC-curve with the command "lroc". This cave me the ROC-curve and
> > the AUC. I wonder:
> >   > - how can I get the 95 % CI for this AUC?
> >   > and
> >   > - I would like to get the sensitivity at a given fixed false-positive
> > rate. Do I have to get all the coordinates on the ROC curve and identify
> > the one at the FPR at interest- and if so, how do I do that, or is it a
> > direct way to do this?
> >   > best wishes
> >   > Ragnhild B Skråstad
> >
> > The simplest way to get CI for a roc curve following logistic regression
> > is to use -predict- and -roctab-:
> >
> > * Start Stata commands *
> > logistic outcome <predictors>
> > capture drop pred
> > predict pred
> > roctab outcome pred
> >
> > * End Stata commands *
> >
> > * outcome and <predictors> are replaced as appropriate.
> > Much quicker and less trouble than bootstrapping.
> >
> > To find the appropriate cutpoint for a given sensitivity you can use -centile-
> with -if-
> > centile pred if outcome == 1, centile(90)
> > Likewise for specificity
> > centile pred if outcome == 0, centile(10)
> >
> > Best wishes,
> >
> > Paul T Seed, Women's Health, KCL
> >
> >
> >

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- SV: st: ROC-curves
  - From: Ragnhild Bergene Skråstad <[email protected]>

Prev by Date: Re: st: loop
Next by Date: st: xtivreg2 with lagged variables
Previous by thread: Re: st: ROC-curves
Next by thread: SV: st: ROC-curves
Index(es):
- Date
- Thread