|
The following question and answer is based on an exchange that started on
Statalist.
How large should the bootstrapped samples be relative to the total
number of cases in the dataset?
|
Title
|
|
Guidelines for bootstrap samples
|
|
Author
|
William Gould, StataCorp Jeff Pitblado, StataCorp
|
|
Date
|
November 2001; updated August 2010
|
Question:
I am running a negative binomial regression on a sample of 488 firms. For
various reasons [...], I decided to use the bootstrapping procedure in Stata
on my data. Are there general guidelines that have been proposed for how
large the bootstrapped samples should be relative to the total number of
cases in the dataset from which they are drawn?
Answer:
When using the bootstrap to estimate standard errors and to construct
confidence intervals, the original sample size should be used. Consider a
simple example where we wish to bootstrap the coefficient on foreign
from a regression of weight and foreign on mpg from the
automobile data. The sample size is 74, but suppose we draw only 37
observations (half of the observed sample size) each time we resample the
data 2,000 times.
. sysuse auto, clear
. set seed 3957574
. bootstrap _b[foreign], size(37) reps(2000) dots: regress mpg weight foreign
(running regress on estimation sample)
Bootstrap replications (2000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
.................................................. 150
.................................................. 200
(output omitted)
.................................................. 1900
.................................................. 1950
.................................................. 2000
Linear regression Number of obs = 74
Replications = 2000
command: regress mpg weight foreign
_bs_1: _b[foreign]
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 | -1.650029 1.67859 -0.98 0.326 -4.940006 1.639948
------------------------------------------------------------------------------
Now consider the same exercise with 74 observations.
. set seed 91857785
. bootstrap _b[foreign], reps(2000) dots: regress mpg weight foreign
(running regress on estimation sample)
Bootstrap replications (2000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
.................................................. 150
.................................................. 200
(output omitted)
.................................................. 1900
.................................................. 1950
.................................................. 2000
Linear regression Number of obs = 74
Replications = 2000
command: regress mpg weight foreign
_bs_1: _b[foreign]
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 | -1.650029 1.116423 -1.48 0.139 -3.838179 .5381206
------------------------------------------------------------------------------
As explained below, the difference in the bias estimates is due to the
random nature of the bootstrap and not the number of observations taken for
each replication. However, the standard error estimates are dependent upon
the number of observations in each replication. Here, on average, we would
expect the variance estimate of _b[foreign] to be twice as large for
a sample of 37 observations than that for 74 observations. This is due
mainly to the form of the variance of the sample mean, s2/n.
The number of observations in the original underlying dataset does not play
a role in determining the number of replications required to get good
bootstrap variance estimates. The dataset must have enough observations
(preferably an infinite number) so that the empirical distribution can be
used as an approximation to the population's true distribution.
In terms of the number of replications, there is no fixed answer such as
“250” or “1,000” to the question. The right answer
is that you should choose an infinite number of replications because, at a
formal level, that is what the bootstrap requires. The key to the
usefulness of the bootstrap is that it converges in terms of numbers of
replications reasonably quickly, and so running a finite number of
replications is good enough—assuming the number of replications chosen
is large enough.
The above statement contains the key to choosing the right number of
replications. Here is the recipe:
- Choose a large but tolerable number of replications. Obtain the
bootstrap estimates.
- Change the random-number seed. Obtain the bootstrap estimates
again, using the same number of replications.
- Do the results change meaningfully? If so, the first number you
chose was too small. Try a larger number. If results are similar
enough, you probably have a large enough number. To be sure, you
should probably perform step 2 a few more times, but I seldom do.
Whether results change meaningfully is a matter of judgment and has to be
interpreted given the problem at hand. How accurate do you need the
standard errors, confidence intervals, etc.? Often, a few digits of
precision is good enough because, even if you had the standard error
calculated perfectly, you have to ask yourself how much you believe your
model in terms of all the other assumptions that went into it. For
instance, in a Becker earnings model of the return to schooling, you might
tell me return is 6% with a standard error of 1, and I might believe
you. If you told me the return is 6.10394884% and the standard error is
.9899394, you have more precision but have not provided any additional
useful information.
If you want more precision, it may take more replications than you would
guess. Using the automobile data, I looked at linear regression,
. regress mpg weight foreign
and obtained the bootstrapped standard error for _b[foreign]. I did
this for 20 replications, 40, 60, all the way up to 4,000. Here is a graph
of the results as a function of the number of replications:
The vertical axis shows the bootstrapped standard error for
_b[foreign]. Even with more than 1,000 replications, the standard
error varied between 1.09 and 1.21, and 90% of the results were between 1.11
and 1.18. As a side experiment, I ran
. bootstrap _b[foreign], reps(20000): regress mpg weight foreign
twice and got a reported standard error of 1.14 and 1.16. At 40,000
replications, I got a reported standard error of 1.14.
Here is the program I used to obtain the above graph:
capture program drop Accum
program Accum
postfile results se bias n using sim, replace
forvalues n=20(20)4000{
noisily display " `n'" _c
quietly bootstrap _b[foreign] e(N), reps(`n'): ///
regress mpg weight foreign
tempname bias
matrix `bias'=e(bias)
local b_bias=`bias'[1,1]
local n=e(N_reps)
local se=_se[_bs_1]
post results (`se') (`b_bias') (`n')
}
postclose results
end
clear
sysuse auto
set seed 12345
Accum
use sim, clear
scatter se n, xtitle("replications") ytitle("bootstrap standard error")
|