FAQ: Guidelines for bootstrap samples

Home / Resources & support / FAQs / Guidelines for bootstrap samples

Note: The following question and answer is based on an exchange that started on Statalist.

How large should the bootstrapped samples be relative to the total number of cases in the dataset?

Title		Guidelines for bootstrap samples
Author		William Gould, StataCorp Jeff Pitblado, StataCorp

Note: The output in this FAQ is consistent with Stata 14 or newer versions. bootstrap is based on random draws, so results are different from older versions because of the new 64-bit Mersenne Twister pseudorandom numbers.

Question:

I am running a negative binomial regression on a sample of 488 firms. For various reasons [...], I decided to use the bootstrapping procedure in Stata on my data. Are there general guidelines that have been proposed for how large the bootstrapped samples should be relative to the total number of cases in the dataset from which they are drawn?

Answer:

When using the bootstrap to estimate standard errors and to construct confidence intervals, the original sample size should be used. Consider a simple example where we wish to bootstrap the coefficient on foreign from a regression of weight and foreign on mpg from the automobile data. The sample size is 74, but suppose we draw only 37 observations (half of the observed sample size) each time we resample the data 2,000 times.

. sysuse auto, clear
    
. set seed 3957574

. bootstrap  _b[foreign], size(37) reps(2000) dots: regress mpg weight foreign
(running regress on estimation sample)

Bootstrap replications (2,000): .........10.........20.........30.........40....
> .....50.........60.........70.........80.........90.........100.........110...

(output omitted)

Linear regression                                        Number of obs =    74
                                                         Replications  = 2,000

      Command: regress mpg weight foreign
        _bs_1: _b[foreign]



                 Observed   Bootstrap                         Normal-based
               coefficient  std. err.      z    P>|z|     [95% conf. interval]
   
       _bs_1    -1.650029   1.661728    -0.99   0.321    -4.906956    1.606898

Now consider the same exercise with 74 observations.

. set seed 91857785
  
. bootstrap  _b[foreign], reps(2000) dots: regress mpg weight foreign
(running regress on estimation sample)

Bootstrap replications (2,000): .........10.........20.........30.........40....
> .....50.........60.........70.........80.........90.........100.........110...

(output omitted)

Linear regression                                        Number of obs =    74
                                                         Replications  = 2,000

      Command: regress mpg weight foreign
        _bs_1: _b[foreign]



                 Observed   Bootstrap                         Normal-based
                coefficient  std. err.      z    P>|z|     [95% conf. interval]
   
       _bs_1    -1.650029   1.121612    -1.47   0.141    -3.848348    .5482899

As explained below, the difference in the bias estimates is due to the random nature of the bootstrap and not the number of observations taken for each replication. However, the standard error estimates are dependent upon the number of observations in each replication. Here, on average, we would expect the variance estimate of _b[foreign] to be twice as large for a sample of 37 observations than that for 74 observations. This is due mainly to the form of the variance of the sample mean, s²/n.

The number of observations in the original underlying dataset does not play a role in determining the number of replications required to get good bootstrap variance estimates. The dataset must have enough observations (preferably an infinite number) so that the empirical distribution can be used as an approximation to the population's true distribution.

In terms of the number of replications, there is no fixed answer such as “250” or “1,000” to the question. The right answer is that you should choose an infinite number of replications because, at a formal level, that is what the bootstrap requires. The key to the usefulness of the bootstrap is that it converges in terms of numbers of replications reasonably quickly, and so running a finite number of replications is good enough—assuming the number of replications chosen is large enough.

The above statement contains the key to choosing the right number of replications. Here is the recipe:

Choose a large but tolerable number of replications. Obtain the bootstrap estimates.
Change the random-number seed. Obtain the bootstrap estimates again, using the same number of replications.
Do the results change meaningfully? If so, the first number you chose was too small. Try a larger number. If results are similar enough, you probably have a large enough number. To be sure, you should probably perform step 2 a few more times, but I seldom do.

Whether results change meaningfully is a matter of judgment and has to be interpreted given the problem at hand. How accurate do you need the standard errors, confidence intervals, etc.? Often, a few digits of precision is good enough because, even if you had the standard error calculated perfectly, you have to ask yourself how much you believe your model in terms of all the other assumptions that went into it. For instance, in a Becker earnings model of the return to schooling, you might tell me return is 6% with a standard error of 1, and I might believe you. If you told me the return is 6.10394884% and the standard error is .9899394, you have more precision but have not provided any additional useful information.

If you want more precision, it may take more replications than you would guess. Using the automobile data, I looked at linear regression,

. regress mpg weight foreign

and obtained the bootstrapped standard error for _b[foreign]. I did this for 20 replications, 40, 60, all the way up to 4,000. Here is a graph of the results as a function of the number of replications:

The vertical axis shows the bootstrapped standard error for _b[foreign]. Even with more than 1,000 replications, the standard error varied between 1.10 and 1.20, and 90% of the results were between 1.11 and 1.18. As a side experiment, I ran

. bootstrap _b[foreign], reps(20000): regress mpg weight foreign

twice and got a reported standard error of 1.14 and 1.16. At 40,000 replications, I got a reported standard error of 1.14.

Here is the program I used to obtain the above graph:

  capture program drop Accum
  program Accum
          postfile results se bias n using sim, replace
          forvalues n=20(20)4000{
                  noisily display " `n'" _c
                  quietly bootstrap  _b[foreign] e(N), reps(`n'):    ///
                      regress mpg weight foreign
                  tempname bias
                  matrix `bias'=e(bias)
                  local b_bias=`bias'[1,1]
                  local n=e(N_reps)
                  local se=_se[_bs_1]
                  post results (`se') (`b_bias') (`n')
          }
          postclose results
  end



clear
sysuse auto
set seed 12345
Accum
use sim, clear
scatter se n, xtitle("replications") ytitle("bootstrap standard error")

How large should the bootstrapped samples be relative to the total number of cases in the dataset?

Question:

Answer:

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


		Observed Bootstrap Normal-based
		coefficient std. err. z P>\|z\| [95% conf. interval]

_bs_1		-1.650029 1.661728 -0.99 0.321 -4.906956 1.606898

Stata/MP4 Annual License (download)

How large should the bootstrapped samples be relative to the total number of cases in the dataset?

Question:

Answer:

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies