Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: bootstrapping problem: difference between two numbers


From   "Kieran McCaul" <kamccaul@meddent.uwa.edu.au>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: bootstrapping problem: difference between two numbers
Date   Wed, 23 Jul 2008 15:52:27 +0800

I can't comment on the rest of your program, but in your line:
	count if 1/(1+exp(-(_b[`_cons']+_b[`x1']*`x1')))>.75

you should have _b[_cons] not _b[`_cons']


______________________________________________
Kieran McCaul MPH PhD
WA Centre for Health & Ageing (M573)
University of Western Australia
Level 6, Ainslie House
48 Murray St
Perth 6000
Phone: (08) 9224-2140
Phone: -61-8-9224-2140
email: kamccaul@meddent.uwa.edu.au
http://myprofile.cos.com/mccaul 
_______________________________________________


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of G. ter Riet
Sent: Wednesday, 23 July 2008 2:25 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: bootstrapping problem: difference between two numbers

Dear Stata listers,
I am trying to work out how much it matters if people use different
definitions of asthma in prediction models. To get rid of any effects of
different choices of study populations and study designs, we perform a
number of logistic regression models that use different definitions of
the dependent variable (asthma) and a fixed set of predictors in a
single database (= 1 population and 1 study design) that we have access
to. I think I successfully bootstrapped the areas under the ROC curve of
two models like this (using the auto.dta data set for illustration):

clear
sysuse auto
*GENERATE THREE BINARY DEPENDENT VARS TO MIMICK OUR THREE ASTHMA
DEFINITIONS g price_d1=(price>5000) *g price_d2=(price>6342) g
price_d3=(price>10000) *PROGRAM TWO LOGISTIC REGRESSIONS CONTAINING THE
SAME XVARS BUT TWO DIFFERENT DEPVARS capture program drop gs_arcade_auc
program define gs_arcade_auc, rclass
	version 10
	args y1 x1 y2 x2  
	confirm var `y1'
	confirm var `x1'
	confirm var `y2'
	confirm var `x2'
	tempname auc1 auc2
	logit `y1' `x1'
	lroc, nograph
	scalar `auc1' = r(area)
	logit `y2' `x2'
	lroc, nograph	
	scalar `auc2' = r(area)
	return scalar dif1=`auc1'-`auc2'
end

#delimit ;
bootstrap dif1=r(dif1),saving(c:\temp\gs_arcade_auc, replace) seed(2345)
reps(1000) nowarn nodots: gs_arcade_auc price_d1 weight price_d3 weight
; estat bootstrap, all

However, when I try to bootstrap a quantity that I think clinicians are
more interested in than the area under the ROC curve, namely, the
difference in the number of patients above some fixed (treatment)
threshold, the program won't run. I can't figure out why not. What would
be a good way to program this? I used <count> since <predict> seems to
save nothing in r(). Any help is greatly appreciated. Here is the code
that won't run:

capture program drop gs_arcade_thresh
program define gs_arcade_thresh, rclass
	version 10
	args y1 x1 y2 x2  
	confirm var `y1'
	confirm var `x1'
	confirm var `y2'
	confirm var `x2'
	tempname treat1 treat2
	logit `y1' `x1'
*COUNT THE NUMBER OF PATIENTS WHOSE PREDICTED PROBABILITY IS GREATER
THAN .75
	count if 1/(1+exp(-(_b[`_cons']+_b[`x1']*`x1')))>.75
	scalar `treat1' = r(N)
	logit `y2' `x2'
	count if 1/(1+exp(-(_b[`_cons']+_b[`x2']*`x2')))>.75
	scalar `treat2' = r(N)

	return scalar dif1=`treat1'-`treat2'

end

#delimit ;
bootstrap dif1=r(dif1),saving(c:\temp\gs_arcade_thresh, replace)
seed(2345)
reps(100): gs_arcade_thresh price_d1 weight price_d3 weight ;

 
Many thanks,
Gerben ter Riet, Ac Medical Center, Amsterdam, The Netherlands


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index