*Note: The following question and answer is based on an exchange that started on
Statalist.*

Title | Guidelines for bootstrap samples | |

Author |
William Gould, StataCorp Jeff Pitblado, StataCorp |

I am running a negative binomial regression on a sample of 488 firms. For various reasons [...], I decided to use the bootstrapping procedure in Stata on my data. Are there general guidelines that have been proposed for how large the bootstrapped samples should be relative to the total number of cases in the dataset from which they are drawn?

When using the bootstrap to estimate standard errors and to construct
confidence intervals, the original sample size should be used. Consider a
simple example where we wish to bootstrap the coefficient on **foreign**
from a regression of **weight** and **foreign** on **mpg** from the
automobile data. The sample size is 74, but suppose we draw only 37
observations (half of the observed sample size) each time we resample the
data 2,000 times.

.sysuse auto, clear.set seed 3957574.bootstrap _b[foreign], size(37) reps(2000) dots: regress mpg weight foreign(running regress on estimation sample) Bootstrap replications (2,000)

1 | 2 | 3 | 4 | 5 |

Observed Bootstrap Normal-based | ||

coefficient std. err. z P>|z| [95% conf. interval] | ||

_bs_1 | -1.650029 1.661728 -0.99 0.321 -4.906956 1.606898 | |

Now consider the same exercise with 74 observations.

.set seed 91857785.bootstrap _b[foreign], reps(2000) dots: regress mpg weight foreign(running regress on estimation sample) Bootstrap replications (2,000)

1 | 2 | 3 | 4 | 5 |

Observed Bootstrap Normal-based | ||

coefficient std. err. z P>|z| [95% conf. interval] | ||

_bs_1 | -1.650029 1.121612 -1.47 0.141 -3.848348 .5482899 | |

As explained below, the difference in the bias estimates is due to the
random nature of the bootstrap and not the number of observations taken for
each replication. However, the standard error estimates are dependent upon
the number of observations in each replication. Here, on average, we would
expect the variance estimate of **_b[foreign]** to be twice as large for
a sample of 37 observations than that for 74 observations. This is due
mainly to the form of the variance of the sample mean, s^{2}/n.

The number of observations in the original underlying dataset does not play a role in determining the number of replications required to get good bootstrap variance estimates. The dataset must have enough observations (preferably an infinite number) so that the empirical distribution can be used as an approximation to the population's true distribution.

In terms of the number of replications, there is no fixed answer such as “250” or “1,000” to the question. The right answer is that you should choose an infinite number of replications because, at a formal level, that is what the bootstrap requires. The key to the usefulness of the bootstrap is that it converges in terms of numbers of replications reasonably quickly, and so running a finite number of replications is good enough—assuming the number of replications chosen is large enough.

The above statement contains the key to choosing the right number of replications. Here is the recipe:

- Choose a large but tolerable number of replications. Obtain the bootstrap estimates.
- Change the random-number seed. Obtain the bootstrap estimates again, using the same number of replications.
- Do the results change meaningfully? If so, the first number you chose was too small. Try a larger number. If results are similar enough, you probably have a large enough number. To be sure, you should probably perform step 2 a few more times, but I seldom do.

Whether results change meaningfully is a matter of judgment and has to be interpreted given the problem at hand. How accurate do you need the standard errors, confidence intervals, etc.? Often, a few digits of precision is good enough because, even if you had the standard error calculated perfectly, you have to ask yourself how much you believe your model in terms of all the other assumptions that went into it. For instance, in a Becker earnings model of the return to schooling, you might tell me return is 6% with a standard error of 1, and I might believe you. If you told me the return is 6.10394884% and the standard error is .9899394, you have more precision but have not provided any additional useful information.

If you want more precision, it may take more replications than you would guess. Using the automobile data, I looked at linear regression,

.regress mpg weight foreign

and obtained the bootstrapped standard error for **_b[foreign]**. I did
this for 20 replications, 40, 60, all the way up to 4,000. Here is a graph
of the results as a function of the number of replications:

The vertical axis shows the bootstrapped standard error for
**_b[foreign]**. Even with more than 1,000 replications, the standard
error varied between 1.10 and 1.20, and 90% of the results were between 1.11
and 1.18. As a side experiment, I ran

.bootstrap _b[foreign], reps(20000): regress mpg weight foreign

twice and got a reported standard error of 1.14 and 1.16. At 40,000 replications, I got a reported standard error of 1.14.

Here is the program I used to obtain the above graph:

capture program drop Accum program Accum postfile results se bias n using sim, replace forvalues n=20(20)4000{ noisily display " `n'" _c quietly bootstrap _b[foreign] e(N), reps(`n'): /// regress mpg weight foreign tempname bias matrix `bias'=e(bias) local b_bias=`bias'[1,1] local n=e(N_reps) local se=_se[_bs_1] post results (`se') (`b_bias') (`n') } postclose results end clear sysuse auto set seed 12345 Accum use sim, clear scatter se n, xtitle("replications") ytitle("bootstrap standard error")