Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Bootstrap: Which standard errors to use?


From   "Stas Kolenikov" <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Bootstrap: Which standard errors to use?
Date   Mon, 8 Dec 2008 14:34:11 -0600

On 12/8/08, Antoine Terracol <terracol@univ-paris1.fr> wrote:
> > Those are exactly the reported standard errors in your second panel.
>  Which, if I followed the thread correctly, should not come as a surprise
> since Anupit's original -bootstrap- command called -logit, robust-

Right, I did not really pay much attention up there :)).

Well the -robust- standard errors are in fact closer to -oim- standard
errors than to the bootstrap standard errors. It is difficult to come
up with a meaningful suggestion in this situation as to which standard
errors are better. A (former) econometrician inside me would like to
remind that modeling the 0/1 decision to buy something (which this
application seem to be related to based on the variable names at
least) treated as the imperfect observation of the underlying
continuous propensity to buy is subject to the scale indeterminacy, so
that the identified combinations of parameters are "slope"/"standard
deviation of the error term" rather than "slope" as it is the case
with linear regression. Biostatisticians would rightfully raise a brow
here -- "What is he talking about? This is a GLM with a canonical
link... and the scale parameter here is 1". Well this is a matter of
interpretation! If you want an economics interpretation, then you
would need to make sure you control that sigma in the denominator to
really talk about betas being on the same scale (and only then the
bootstrap will make sense) -- which unfortunately cannot be
guaranteed.

Another aspect is the numeric stability of the logistic regression
estimates. For some bootstrap samples, the logit estimates are not
defined -- say if you sampled all zeroes, or as many ones as you have
regressors in the model so that the outcome of 1 can be perfectly
predicted with coefficient values at infinity. In some likelihood, the
samples that are "close", in some sense, to those extreme outcomes may
also produce "large" estimates of coefficients. Are those sensible
outcomes for the bootstrap? Probably not; hence the bootstrap
procedure might need to be modified to control the relative
proportions of 0s and 1s. In the simplest way, you do some sort of
stratified bootstrap: resample separately as many zero outcomes as
there were in the original sample, and as many ones as there were
originally. Is that a better bootstrap scheme? At least it takes care
of that infinite estimates issue. In Stata, you can do this by simply
adding -strata(response_variable)- to your bootstrap options.
Stratification usually brings down variances, and I would expect in
this case that the standard errors will now be much closer to the
-oim- and -robust- ones.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index