Bootstrap sampling and estimation

Order

<- See Stata's other features

Bootstrap of Stata commands
Bootstrap of community-contributed programs
Standard errors and bias estimation

Stata’s programmability makes performing bootstrap sampling and estimation possible (see Efron 1979, 1982; Efron and Tibshirani 1993; Mooney and Duval 1993). We provide two options to simplify bootstrap estimation. bsample draws a sample with replacement from a dataset. bsample may be used in community-contributed programs.

It is easier, however, to perform bootstrap estimation using the bootstrap prefix. bootstrap allows the user to supply an expression that is a function of the stored results of existing commands, or you can write a program to calculate the statistics of interest. bootstrap then can repeatedly draw a sample with replacement, run the community-contributed program, collect the results into a new dataset, and present the results. The community-contributed calculation program is easy to write because every Stata command saves the statistics it calculates.

For instance, assume that we wish to obtain the bootstrap estimate of the standard error of the median of a variable called mpg. Stata's feature calculates and displays summary statistics with summarize; it calculates means, standard deviations, skewness, kurtosis, and various percentiles. Among those percentiles is the 50th percentile—the median. In addition to displaying the calculated results, summarize stores them, and looking in the manual, we discover that the median is stored in r(p50). To get a bootstrap estimate of its standard error, all we need to do is type

. bootstrap r(p50), reps(1000): summarize mpg, detail

and bootstrap will do all the work for us. We'll also specify a seed() option so that you can reproduce our results.

. webuse auto
  (1978 automobile data)

. bootstrap r(p50), reps(1000) seed(1234): summarize mpg, detail
  (running summarize on estimation sample)


warning: summarize does not set e(sample), so no observations will be excluded from the 
         resampling because of missing values or other reasons. To exclude observations, 
	 press Break, save the data, drop any observations that are to be excluded, and rerun 
         bootstrap.

  Bootstrap replications (1000)

  (output omitted)

  Bootstrap results                                              Number of obs =   74
                                                                 Replications = 1,000

        Command:  summarize mpg, detail
          _bs_1:  r(p50)



                  Observed    Bootstrap                         Normal-based       
                coefficient   std. err.      z    P>|z|     [95% conf. interval]
  
         _bs_1          20    .9584585    20.87   0.000     18.12146    21.87854

Use estat bootstrap to report a table with alternative confidence intervals and an estimate of bias.

. estat bootstrap, all

  Bootstrap results                              Number of obs      =        74
                                                 Replications       =      1000

        command:  summarize mpg, detail
          _bs_1:  r(p50)




                    Observed             Bootstrap
                 coefficient     Bias    std. err.  [95% conf. interval]
   
         _bs_1             20     .174   .95845847    18.12146   21.87854  (N)
                                                           19         22  (P)
                                                           19         22 (BC)



Key:  N: Normal
      P: Percentile
     BC: Bias-corrected

For an example of when we would need to write a program, consider the case of bootstrapping the ratio of two means.

We first define the calculation routine, which we can name whatever we wish,

 program myratio, rclass
     version 17
     summarize length
     local length = r(mean)
     summarize turn
     local turn = r(mean)
     return scalar ratio = `length'/`turn'
 end

Our program calls summarize and stores the mean of the variable length in a local macro. The program then repeats this procedure for the second variable turn. Finally, the ratio of the two means is computed and returned by our program in the stored result we call r(ratio).

With our program written, we can now obtain the bootstrap estimate by simply typing

. bootstrap r(ratio), reps(#): myratio

This means that we will execute bootstrap with our myratio program for # replications. Below we request 1,000 replications and specify a random-number seed so you can reproduce our results:

. bootstrap r(ratio), reps(1000) seed(4567): myratio
  (running myratio on estimation sample)


warning: myratio does not set e(sample), bootstrap, so no observations will be 
	 excluded from the resampling because of missing values or other reasons. 
	 To exclude observations, press Break, save the data, drop any observations 
	 that are to be excluded, and rerun bootstrap.

  Bootstrap replications (1000)

    (output omitted) 

  Bootstrap results                                    Number of obs     =         74
                                                       Replications      =      1,000

        command:  myratio
          _bs_1:  r(ratio)



                  Observed    Bootstrap                         Normal-based     
                coefficient   std. err.      z    P>|z|     [95% conf. interval]
  
         _bs_1    4.739945    .0330492   143.42   0.000      4.67517    4.804721

The ratio, calculated over the original sample, is 4.739945; the bootstrap estimate of the standard error of the ratio is 0.0344786. Had we wanted to keep the 1,000-observation dataset of bootstrapped results for subsequent analysis, we would have typed

. bootstrap r(ratio), reps(1000) seed(4567) saving(mydata): myratio

bootstrap can be used with any Stata estimator or calculation command and even with community-contributed calculation commands.

We have found bootstrap particularly useful in obtaining estimates of the standard errors of quantile-regression coefficients. Stata performs quantile regression and obtains the standard errors using the method suggested by Koenker and Bassett (1978, 1982). Rogers (1992) reports that these standard errors are satisfactory in the homoskedastic case but that they appear to be understated in the presence of heteroskedastic errors. One alternative is to bootstrap the estimated coefficients to obtain the standard errors. For instance, say that you wish to estimate a median regression of price on variables weight, length, and foreign. Typing qreg price weight length foreign will produce the estimates along with Koenker–Bassett standard errors. To obtain bootstrap standard errors, we could issue the command

. bootstrap, reps(#): qreg price weight length foreign

We recommend this procedure so highly that Gould (1992) wrote a new command in Stata’s programming language to further automate this procedure for quantile regression. Typing bsqreg price weight length foreign will also produce the bootstrapped results.

References

Efron, B. 1979. Bootstrap methods: another look at the jackknife. Annals of Statistics 7: 1–26.

------. 1982. The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.

Efron, B. and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. New York: Chapman & Hall.

Gould, W. 1992. sg11.1: Quantile regression with bootstrapped standard errors. Stata Technical Bulletin 9: 19–21. Reprinted in Stata Technical Bulletic Reprints, vol. 2, pp. 137–139.

Koenker, R., and G. Bassett, Jr. 1978. Asymptotic theory of least absolute error regression. Journal of the American Statistical Association 73: 618–622.

------. 1982. Robust tests for heteroskedasticity based on regression quantiles. Econometrica 50: 43–61.

Mooney, C. Z., and R. D. Duval. 1993. Bootstrapping: A Nonparametric Approach to Statistical Inference. Newbury Park, CA: Sage.

Rogers, W. H. 1992. sg11: Quantile regression standard errors. Stata Technical Bulletin 9: 16–19. Reprinted in Stata Technical Bulletin Reprints, vol. 2, pp. 133–137.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Observed Bootstrap Normal-based
		coefficient std. err. z P>\|z\| [95% conf. interval]

_bs_1		20 .9584585 20.87 0.000 18.12146 21.87854


		Observed Bootstrap
		coefficient Bias std. err. [95% conf. interval]

_bs_1		20 .174 .95845847 18.12146 21.87854 (N)
		19 22 (P)
		19 22 (BC)