Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: impact of bootstrapping on predicted probabilities of logistic regression models


From   Maarten buis <[email protected]>
To   [email protected]
Subject   Re: st: impact of bootstrapping on predicted probabilities of logistic regression models
Date   Fri, 3 Oct 2008 15:35:33 +0100 (BST)

I don't think it is a bad idea, but I don't have time to implement it.
The one thing that would worry me is that it is likely that not all
combinations are present in all bootstrap samples. This can become 
especially problematic if some combinations are rare, so that these
rogue samples become very common. A useful alternative would be to look
at -prchange-, which is part of the -spost package-, see -findit
spost-, and -mfx-, see: -help mfx-.

-- Maarten

--- "G. ter Riet" <[email protected]> wrote:
> In medical research on prediction or diagnosis, we often use the
> bootstrap to calculate confidence intervals for an area under the
> curve, AUC, corresponding to a particular logistic regression model
> that is used for prediction of an event (e.g. death or some target
> illness). The AUC is a global measure for how well the model
> discriminates between those with an without an event. A program that
> does this might look as follows:
> 
> capture program drop rocb
> program rocb, rclass
> logit `1' `2' `3' `4' 
> predict p
> roctab `1' p
> drop p
> return scalar rocb = r(area)
> end
> bootstrap auc=r(rocb), reps(200): rocb  depvar indepvar1 indepvar2
> indepvar3
> 
> What I should like to do, however, is to give readers of my paper an
> impression of the impact of bootstrapping, not just on the AUC, but
> on the distribution of predicted probabilities calculated from the
> logistic model since most clinicians are not that comfortable with
> AUCs of a ROC curve.  Suppose, my indepvars are all binary. Then I'd
> have 2^3=8 covariate patterns and potentially a unique predicted
> probability for each covariate pattern for each bootstrap sample. For
> each covariate pattern, I'd like to average the predicted
> probabilities across the 200 samples (and perhaps say something about
> their variability).
> My programming abilities of Stata are not good enough to solve this
> efficiently. Any help I'd greatly appreciate. Of course any comments
> on whether you think the whole idea is worthwhile are welcome too.
> Cheers, Gerben ter Riet (epidemiologist, AMC, Dept General Practice,
> Amsterdam, NL)
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index