[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"G. ter Riet" <G.terRiet@amc.uva.nl> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Tibshirani's enhanced bootstrap (ado file) |

Date |
Thu, 23 Oct 2008 15:35:44 +0200 |

Dear Statalisters, Last August, I wrote a somewhat primitive, yet working do-file to perform the enhanced bootstrap procedure to correct for over-optimism in predictive logistic models. I asked Patrick Royston's help to turn that do-file into something more useful for the Stata community. He kindly helped me out and his code works (see below). However, two issues remain: there is no help file and there is a question about what is the best way of reporting the Standard Deviations (SD) over the bootstrap samples (see below). Below I copied our correspondence on this topic. I feel that neither my statistical knowledge, nor my programming abilities are firm enough to write a good help file and decide on and program the correct SD. I hope that someone on the list might volunteer to (slightly) edit Patrick Royston's code, if needed, and perhaps write a help file to the program to ensure that the Stata community can fully benefit from what to me seems a useful procedure. In particular, I think that P atrick's suggestion might be correct that "It may be preferable to report the SD over the bootstrap samples, this being the bootstrap-based SE of the optimism" (see below). That part of the code would then need changing. It would be wonderful of course if Patrick's suggestions of generalization to other commands than just logit could be realized. Sent: 28 August 2008 10:01 To: Patrick Royston Subject: enhanced bootstrap Dear professor Royston, dear Patrick, I have recently programmed a do file performing (I hope) Tibshirani's 'enhanced bootstrap' to estimate overoptimism in predictive models. Unfortunately, my programing abilities are not as they should be. I wondered whether you think that the Stata community might benefit from a more professional ado version of this programme (I could not find one on the internet). If so, would you have time and interest to help me modify this do file into an ado file for general use? A (non-mathematical) description of the enhanced bootstrap may be found on: http://books.google.nl/books?id=kfHrF-bVcvQC&pg=PA94&lpg=PA94&dq=%22enhanced+bootstrap%22&source=web&ots=33G--3dhD3&sig=rGjRgs0N4-n8ND5CEpiHl42ZPB0&hl=nl&sa=X&oi=book_result&resnum=5&ct=result Many thanks for your consideration. Kind regards, Gerben ter Riet On August 31 Patrick Royston replied: Dear Gerben, Here is a stab at an ado-file for your bootstrap. I am sending two versions. The first (bstrap_enhanced) uses bsample and preserve/restore, whereas the second (bstrap_enhanced2) uses the bsample,weight() approach and is more than twice as fast, since the data are not disturbed. Both of these programs could be enhanced and generalized, e.g. to use commands other than logit such stcox, but one would of course need to consider performance measures alternative to AUROC. The syntax is bstrap_enhanced[2] yvar xvars [if] [in] [ , Reps(#) Seed(#) ] where reps defaults to 200, and seed to 0, ie. no random number seed is set. The output includes the mean optimism and its SE over the bootstrap samples. It may be preferable to report the SD over the bootstrap samples, this being the bootstrap-based SE of the optimism; you might take a look at what the literature suggests. Either way it is a simple matter of modifying the line "return scalar opt_se = sqrt( (`ssopt' - `sopt'^2 / `reps') / (`reps' - 1) / `reps' )". Feel free to do what you like with the programs. It is fine if you wish to distribute them on Statalist or SSC archive, but if you do the latter you would need to write a help file. Hope this helps Best wishes Patrick Royston Here is the code that Patrick provided for bootstrap_enhanced (capitals were mine): *THIS FILE COMPUTES AUCs TO CALCULATE THE EXTENT OF OVEROPTIMISM INVOLVED *IN A PREDICTION MODEL USING TIBSHIRANI'S ENHANCED BOOTSTRAP *AS DESCRIBED IN HARRELL'S BOOK ON MODELLING ON PAGE 94. program define bstrap_enhanced, rclass version 10 syntax varlist(numeric min=2) [if] [in] [, Reps(int 200) Seed(int 0) ] gettoken yvar xvars : varlist marksample touse tempname opt sopt ssopt area0 area quietly { if `seed' > 0 set seed `seed' scalar `sopt' = 0 scalar `ssopt' = 0 // Compute AUC on original data logit `yvar' `xvars' if `touse' lroc, nograph scalar `area0' = r(area) forvalues i = 1 / `reps' { preserve bsample logit `yvar' `xvars' if `touse' lroc, nograph scalar `area' = r(area) // apply bsample's coefficients to original population restore lroc, nograph scalar `opt' = `area' - r(area) scalar `sopt' = `sopt' + `opt' scalar `ssopt' = `ssopt' + `opt' ^ 2 } } return scalar auc = `area0' return scalar opt = `sopt' / `reps' return scalar opt_se = sqrt( (`ssopt' - `sopt'^2 / `reps') / (`reps' - 1) / `reps' ) di as txt _n "AUC = " as res %6.4f return(auc) /// as txt " optimism = " %6.4f as res return(opt) /// as txt " SE = " %6.4f as res return(opt_se) end Here is the code that Patrick provided for bootstrap_enhanced2: *! v 1.0.0 PR 31aug2008 program define bstrap_enhanced2, rclass version 10 syntax varlist(numeric min=2) [if] [in] [, Reps(int 200) Seed(int 0) ] gettoken yvar xvars : varlist marksample touse tempname opt sopt ssopt area0 area tempvar w xb quietly { if `seed' > 0 set seed `seed' scalar `sopt' = 0 scalar `ssopt' = 0 // Compute AUC on original data logit `yvar' `xvars' if `touse' lroc, nograph scalar `area0' = r(area) gen long `w' = . forvalues i = 1 / `reps' { bsample, weight(`w') logit `yvar' `xvars' if `touse' [fweight = `w'] lroc, nograph scalar `area' = r(area) // apply bsample's coefficients to original population predict `xb' if `touse' logit `yvar' `xb' lroc, nograph scalar `opt' = `area' - r(area) scalar `sopt' = `sopt' + `opt' scalar `ssopt' = `ssopt' + `opt' ^ 2 drop `xb' } } return scalar auc = `area0' return scalar opt = `sopt' / `reps' return scalar opt_se = sqrt( (`ssopt' - `sopt'^2 / `reps') / (`reps' - 1) / `reps' ) di as txt _n "AUC = " as res %6.4f return(auc) /// as txt " optimism = " %6.4f as res return(opt) /// as txt " SE = " %6.4f as res return(opt_se) end To be complete, my original do file looked like this and uses the auto.dta file to illustrate the analysis: capture program drop bstrap_enhanced program define bstrap_enhanced, rclass version 10 tempname auc tempvar area kans area2 sysuse auto,clear g price_d1=(price>5000) postfile `auc' cons weight mpg trunk area area2 using roc1, replace qui { forval i = 1/200 { bsample logit price_d1 weight mpg trunk local cons = _b[_cons] local weight = _b[weight] local trunk = _b[trunk] local mpg = _b[mpg] lroc, nogr local area = r(area) * apply bsample's coefficients to original population sysuse auto, clear g price_d1=(price>5000) g kans = 1/(1+(exp(-(`cons'+`weight'*weight+`mpg'*mpg+`trunk'*trunk)))) roctab price_d1 kans, hanley local area2 = r(area) drop kans post `auc' (`cons') (`weight') (`mpg') (`trunk') (`area') (`area2') } } postclose `auc' use roc1,clear g optimism = area - area2 su cons weight mpg trunk area area2 optimism end *NOW RUN THE MODEL ON THE ORIGINAL DATA SET, CALCULATE AUC AND SUBTRACT MEAN OPTIMISM FROM THAT LATTER AUC Best wishes, Gerben ter Riet, epidemiologist, Dept of General Practice, AMC, Amsterdam , Netherlands * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: Tibshirani's enhanced bootstrap (ado file)***From:*"Martin Weiss" <martin.weiss1@gmx.de>

- Prev by Date:
**Re: st: RE: Calling a Do-file** - Next by Date:
**Re: st: Problem with odbc load and long path name** - Previous by thread:
**st: egen(cut)** - Next by thread:
**st: RE: Tibshirani's enhanced bootstrap (ado file)** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |