Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Tibshirani's enhanced bootstrap (ado file)


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Tibshirani's enhanced bootstrap (ado file)
Date   Thu, 23 Oct 2008 15:42:57 +0200

If it is all about the technical aspects of writing a help file, try -help
examplehelpfile- and -view- it with the -asis- option as in

**********
qui findfile examplehelpfile.sthlp
*view "`r(fn)'"
view "`r(fn)'", asis
**********


HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of G. ter Riet
Sent: Thursday, October 23, 2008 3:36 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: Tibshirani's enhanced bootstrap (ado file)

Dear Statalisters,
Last August, I wrote a somewhat primitive, yet working do-file to perform
the enhanced bootstrap procedure to correct for over-optimism in predictive
logistic models. I asked Patrick Royston's help to turn that do-file into
something more useful for the Stata community. He kindly helped me out and
his code works (see below). However, two issues remain: there is no help
file and there is a question about what is the best way of reporting the
Standard Deviations (SD) over the bootstrap samples (see below). Below I
copied our correspondence on this topic. I feel that neither my statistical
knowledge, nor my programming abilities are firm enough to write a good help
file and decide on and program the correct SD. I hope that someone on the
list might volunteer to (slightly) edit Patrick Royston's code, if needed,
and perhaps write a help file to the program to ensure that the Stata
community can fully benefit from what to me seems a useful procedure. In
particular, I think that P
 
atrick's suggestion might be correct that "It may be preferable to report
the SD over the bootstrap samples, this being the bootstrap-based SE of the
optimism" (see below). That part of the code would then need changing. It
would be wonderful of course if Patrick's suggestions of generalization to
other commands than just logit could be realized.

Sent: 28 August 2008 10:01
To: Patrick Royston
Subject: enhanced bootstrap

Dear professor Royston, dear Patrick,
I have recently programmed a do file performing (I hope) Tibshirani's
'enhanced bootstrap' to estimate overoptimism in predictive models.
Unfortunately, my programing abilities are not as they should be. I wondered
whether you think that the Stata community might benefit from a more
professional ado version of this programme (I could not find one on the
internet). If so, would you have time and interest to help me modify this do
file into an ado file for general use?

A (non-mathematical) description of the enhanced bootstrap may be found on:

http://books.google.nl/books?id=kfHrF-bVcvQC&pg=PA94&lpg=PA94&dq=%22enhanced
+bootstrap%22&source=web&ots=33G--3dhD3&sig=rGjRgs0N4-n8ND5CEpiHl42ZPB0&hl=n
l&sa=X&oi=book_result&resnum=5&ct=result

Many thanks for your consideration.

Kind regards,
Gerben ter Riet

On August 31 Patrick Royston replied:
Dear Gerben,

Here is a stab at an ado-file for your bootstrap. I am sending two versions.
The first (bstrap_enhanced) uses bsample and preserve/restore, whereas the
second (bstrap_enhanced2) uses the bsample,weight() approach and is more
than twice as fast, since the data are not disturbed.

Both of these programs could be enhanced and generalized, e.g. to use
commands other than logit such stcox, but one would of course need to
consider performance measures alternative to AUROC.

The syntax is bstrap_enhanced[2] yvar xvars [if] [in] [ , Reps(#) Seed(#) ]

where reps defaults to 200, and seed to 0, ie. no random number seed is set.

The output includes the mean optimism and its SE over the bootstrap samples.
It may be preferable to report the SD over the bootstrap samples, this being
the bootstrap-based SE of the optimism; you might take a look at what the
literature suggests. Either way it is a simple matter of modifying the line
"return scalar opt_se = sqrt( (`ssopt' - `sopt'^2 / `reps') / (`reps' - 1) /
`reps' )".

Feel free to do what you like with the programs. It is fine if you wish to
distribute them on Statalist or SSC archive, but if you do the latter you
would need to write a help file.

Hope this helps

Best wishes
Patrick Royston

Here is the code that Patrick provided for bootstrap_enhanced (capitals were
mine):
*THIS FILE COMPUTES AUCs TO CALCULATE THE EXTENT OF OVEROPTIMISM INVOLVED
*IN A PREDICTION MODEL USING TIBSHIRANI'S ENHANCED BOOTSTRAP
*AS DESCRIBED IN HARRELL'S BOOK ON MODELLING ON PAGE 94.

program define bstrap_enhanced, rclass
	version 10
	syntax varlist(numeric min=2) [if] [in] [, Reps(int 200) Seed(int 0)
]
	gettoken yvar xvars : varlist
	marksample touse
	tempname opt sopt ssopt area0 area
	quietly {
		if `seed' > 0 set seed `seed'
		scalar `sopt' = 0
		scalar `ssopt' = 0
		// Compute AUC on original data
		logit `yvar' `xvars' if `touse'
		lroc, nograph
		scalar `area0' = r(area)
		forvalues i = 1 / `reps' {
			preserve
			bsample
			logit `yvar' `xvars' if `touse'
			lroc, nograph
			scalar `area' = r(area)
			// apply bsample's coefficients to original
population
			restore
			lroc, nograph
			scalar `opt' = `area' - r(area)
			scalar `sopt' = `sopt' + `opt'
			scalar `ssopt' = `ssopt' + `opt' ^ 2
		}
	}
	return scalar auc = `area0'
	return scalar opt = `sopt' / `reps'
	return scalar opt_se = sqrt( (`ssopt' - `sopt'^2 / `reps') / (`reps'
- 1) / `reps' )
	di as txt _n "AUC = " as res %6.4f return(auc) ///
	 as txt " optimism = " %6.4f as res return(opt) ///
	 as txt " SE = " %6.4f as res return(opt_se)
end


Here is the code that Patrick provided for bootstrap_enhanced2:
*! v 1.0.0 PR 31aug2008
program define bstrap_enhanced2, rclass
	version 10
	syntax varlist(numeric min=2) [if] [in] [, Reps(int 200) Seed(int 0)
]
	gettoken yvar xvars : varlist
	marksample touse
	tempname opt sopt ssopt area0 area
	tempvar w xb
	quietly {
		if `seed' > 0 set seed `seed'
		scalar `sopt' = 0
		scalar `ssopt' = 0
		// Compute AUC on original data
		logit `yvar' `xvars' if `touse'
		lroc, nograph
		scalar `area0' = r(area)
		gen long `w' = .
		forvalues i = 1 / `reps' {
			bsample, weight(`w')
			logit `yvar' `xvars' if `touse' [fweight = `w']
			lroc, nograph
			scalar `area' = r(area)
			// apply bsample's coefficients to original
population
			predict `xb' if `touse'
			logit `yvar' `xb'
			lroc, nograph
			scalar `opt' = `area' - r(area)
			scalar `sopt' = `sopt' + `opt'
			scalar `ssopt' = `ssopt' + `opt' ^ 2
			drop `xb'
		}
	}
	return scalar auc = `area0'
	return scalar opt = `sopt' / `reps'
	return scalar opt_se = sqrt( (`ssopt' - `sopt'^2 / `reps') / (`reps'
- 1) / `reps' )
	di as txt _n "AUC = " as res %6.4f return(auc) ///
	 as txt " optimism = " %6.4f as res return(opt) ///
	 as txt " SE = " %6.4f as res return(opt_se)
end


To be complete, my original do file looked like this and uses the auto.dta
file to illustrate the analysis:

capture program drop bstrap_enhanced
program define bstrap_enhanced, rclass
	version 10
	tempname auc
	tempvar area kans area2
	sysuse auto,clear
	g price_d1=(price>5000)
	postfile `auc' cons weight mpg trunk area area2 using roc1, replace
	qui   {
		forval i = 1/200 {
		bsample
		logit price_d1 weight mpg trunk
		local cons = _b[_cons]
		local weight = _b[weight]
		local trunk = _b[trunk]
		local mpg = _b[mpg]
		lroc, nogr
		local area = r(area)
		* apply bsample's coefficients to original population
		sysuse auto, clear
		g price_d1=(price>5000)
		g kans =
1/(1+(exp(-(`cons'+`weight'*weight+`mpg'*mpg+`trunk'*trunk))))
		roctab price_d1 kans, hanley
		local area2 = r(area)
		drop kans
		post `auc' (`cons') (`weight') (`mpg') (`trunk') (`area')
(`area2')
		}
	}
	postclose `auc'
	

use roc1,clear
g optimism = area - area2
su cons weight mpg trunk area area2 optimism
end

*NOW RUN THE MODEL ON THE ORIGINAL DATA SET, CALCULATE AUC AND SUBTRACT MEAN
OPTIMISM FROM THAT LATTER AUC

Best wishes,
Gerben ter Riet, epidemiologist, Dept of General Practice, AMC, Amsterdam ,
Netherlands


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index