Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: bssize


From   "Brian P. Poi" <bpoi@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: bssize
Date   Thu, 26 Oct 2006 09:47:19 -0500 (CDT)

On Thu, 26 Oct 2006, Scott Cunningham wrote:

I'm trying to determine how many replications to use with -bsqreg- and am looking over -bssize-. This 3-step process is going to take a long time if I have to first estimate the model using over 1000 replications, let alone use the additional two-steps. Do many of you suggest this 3-step approach of Andrews and Bunchinsky's 2000 Econometrica article, though?*
Scott,

Because -bsqreg- does not save the bootstrap parameter estimates in a datafile, -bssize refine- and -bssize analyze- will not work with it.
However, that's not a problem, since we can use the bootstrap: prefix command with -qreg-.

If 1,000 initial replications would require too much time, one alternative is to accept a higher percentage deviation from the optimal bootstrap by specifying a higher number in the pdb() option or else accept a higher probability that your estimated statistic will deviate by more than pdb% by specifying a higher value for tau().

If you don't want to hassle with having to run some bootstrap replications and then potentially having to run even more, then I'd suggest at least using -bssize analyze- after running the number of replications you choose to get some idea of whether your number was large enough.

For example, say I do a simple median regression with 100 bootstrap replications:

. sysuse auto, clear
. set seed 1
. bootstrap b_gear = _b[gear], reps(100) saving(bsdata, replace): ///
qreg mpg gear
(output omitted)
. bssize analyze using bsdata, pdb(5)

Analysis of bootstrap results for standard errors
---------------------------------------------------------------
Percent deviation (pdb) 5.000
---------------------------------------------------------------
Parameter | Final Size tau 1 - tau
-------------+-------------------------------------------------
b_gear | 100 0.623 0.377
---------------------------------------------------------------
Maximum 0.623 0.377


Those results tell me that the probability that my bootstrap standard error differs by more than 5 percent from the standard error I would obtain with an infinite number of replications is over 62%! In plain English, there is a good chance that my bootstrap standard error differs
quite substantially from what I would get using infinitely many replications.

Even if I am willing to accept a percent deviation (pdb) of 10%, there is still nearly a 1-in-3 chance that my standard errors miss that mark:

. bssize analyze using bsdata, pdb(10)

Analysis of bootstrap results for standard errors
---------------------------------------------------------------
Percent deviation (pdb) 10.000
---------------------------------------------------------------
Parameter | Final Size tau 1 - tau
-------------+-------------------------------------------------
b_gear | 100 0.325 0.675
---------------------------------------------------------------
Maximum 0.325 0.675


What those results tell me is that if I want to have any reasonable level of confidence (colloquially speaking) in my bootstrap standard errors, I need to use more than 100 replications.

If I redo that analysis with 1000 replications instead of 100, the probability of exceeding the 5% percent deviation threshold is 11.4%, and the probability of exceeding the 10% threshold is only 0.2%. Although certainly open to subjective interpretation, the analysis using 1000 replications suggests to me that 1000 is a reasonable number of replications to use for this model and dataset.

-- Brian Poi
-- bpoi@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index