On Thu, 26 Oct 2006, Scott Cunningham wrote:
I'm trying to determine how many replications to use with -bsqreg- and am
looking over -bssize-. This 3-step process is going to take a long time if I
have to first estimate the model using over 1000 replications, let alone use
the additional two-steps. Do many of you suggest this 3-step approach of
Andrews and Bunchinsky's 2000 Econometrica article, though?*
Scott,
Because -bsqreg- does not save the bootstrap parameter estimates in a
datafile, -bssize refine- and -bssize analyze- will not work with it.
However, that's not a problem, since we can use the bootstrap: prefix
command with -qreg-.
If 1,000 initial replications would require too much time, one alternative
is to accept a higher percentage deviation from the optimal bootstrap by
specifying a higher number in the pdb() option or else accept a higher
probability that your estimated statistic will deviate by more than pdb%
by specifying a higher value for tau().
If you don't want to hassle with having to run some bootstrap replications
and then potentially having to run even more, then I'd suggest at least
using -bssize analyze- after running the number of replications you choose
to get some idea of whether your number was large enough.
For example, say I do a simple median regression with 100 bootstrap
replications:
. sysuse auto, clear
. set seed 1
. bootstrap b_gear = _b[gear], reps(100) saving(bsdata, replace): ///
qreg mpg gear
(output omitted)
. bssize analyze using bsdata, pdb(5)
Analysis of bootstrap results for standard errors
---------------------------------------------------------------
Percent deviation (pdb) 5.000
---------------------------------------------------------------
Parameter | Final Size tau 1 - tau
-------------+-------------------------------------------------
b_gear | 100 0.623 0.377
---------------------------------------------------------------
Maximum 0.623 0.377
Those results tell me that the probability that my bootstrap standard
error differs by more than 5 percent from the standard error I would
obtain with an infinite number of replications is over 62%! In plain
English, there is a good chance that my bootstrap standard error differs
quite substantially from what I would get using infinitely many
replications.
Even if I am willing to accept a percent deviation (pdb) of 10%, there is
still nearly a 1-in-3 chance that my standard errors miss that mark:
. bssize analyze using bsdata, pdb(10)
Analysis of bootstrap results for standard errors
---------------------------------------------------------------
Percent deviation (pdb) 10.000
---------------------------------------------------------------
Parameter | Final Size tau 1 - tau
-------------+-------------------------------------------------
b_gear | 100 0.325 0.675
---------------------------------------------------------------
Maximum 0.325 0.675
What those results tell me is that if I want to have any reasonable level
of confidence (colloquially speaking) in my bootstrap standard errors, I
need to use more than 100 replications.
If I redo that analysis with 1000 replications instead of 100, the
probability of exceeding the 5% percent deviation threshold is 11.4%, and
the probability of exceeding the 10% threshold is only 0.2%. Although
certainly open to subjective interpretation, the analysis using 1000
replications suggests to me that 1000 is a reasonable number of
replications to use for this model and dataset.
-- Brian Poi
-- bpoi@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/