Order Stata
## Bootstrap sampling and estimation

**. bootstrap r(p50), reps(1000): summarize mpg, detail**
**. webuse auto**
(1978 Automobile Data)
**. bootstrap r(p50), reps(1000) seed(1234): summarize mpg, detail**
(running summarize on estimation sample)
Warning: Because summarize is not an estimation command or does not set
e(sample), bootstrap has no way to determine which observations are
used in calculating the statistics and so assumes that all
observations are used. This means that no observations will be
excluded from the resampling because of missing values or other
reasons.
If the assumption is not true, press Break, save the data, and drop
the observations that are to be excluded. Be sure that the dataset
in memory contains only the relevant data.
Bootstrap replications (1000)
(*output omitted*)
Bootstrap results Number of obs = 74
Replications = 1,000
command: summarize mpg, detail
_bs_1: r(p50)

**. estat bootstrap, all**
Bootstrap results Number of obs = 74
Replications = 1000
command: summarize mpg, detail
_bs_1: r(p50)

(N) normal confidence interval
(P) percentile confidence interval
(BC) bias-corrected confidence interval
**. bootstrap r(ratio), reps(***#***): myratio**
**. bootstrap r(ratio), reps(1000) seed(4567): myratio**
(running myratio on estimation sample)
Warning: Because myratio is not an estimation command or does not set
e(sample), bootstrap has no way to determine which observations are
used in calculating the statistics and so assumes that all
observations are used. This means that no observations will be
excluded from the resampling because of missing values or other
reasons.
If the assumption is not true, press Break, save the data, and drop
the observations that are to be excluded. Be sure that the dataset
in memory contains only the relevant data.
Bootstrap replications (1000)
(*output omitted*)
Bootstrap results Number of obs = 74
Replications = 1,000
command: myratio
_bs_1: r(ratio)

**. bootstrap r(ratio), reps(1000) seed(4567) saving(mydata): myratio**
**. bootstrap, reps(#): qreg price weight length foreign**
### References

- Bootstrap of Stata commands
- Boostrap of user-written programs
- Standard errors and bias estimation

Stata’s programmability makes performing bootstrap sampling and
estimation possible (see Efron 1979, 1982; Efron and Tibshirani 1993; Mooney
and Duval 1993). We provide two options to simplify bootstrap estimation.
**bsample** draws a
sample with replacement from a dataset. **bsample** may be used in
user-written programs.

It is easier, however, to perform bootstrap estimation using the
**bootstrap**
prefix. **bootstrap** allows the user to supply an expression
that is a function of the stored results of existing commands, or you can
write a program to calculate the statistics of interest. **bootstrap**
then can repeatedly draw a sample with replacement, run the user-written
program, collect the results into a new dataset, and present the results.
The user-written calculation program is easy to write because every Stata
command saves the statistics it calculates.

For instance, assume that we wish to obtain the bootstrap estimate of the
standard error of the median of a variable called **mpg**. Stata's feature
calculates and displays summary statistics with **summarize**; it calculates means, standard
deviations, skewness, kurtosis, and various percentiles. Among those
percentiles is the 50th percentile—the median. In addition to
displaying the calculated results, **summarize** stores them, and looking
in the manual, we discover that the median is stored in **r(p50)**. To get
a bootstrap estimate of its standard error, all we need to do is type

and **bootstrap** will do all the work for us. We'll also specify a
**seed()** option so that you can reproduce our results.

Observed Bootstrap Normal-based | ||||||||

Coef. Std. Err. z P>|z| [95% Conf. Interval] | ||||||||

_bs_1 | 20 .9584585 20.87 0.000 18.12146 21.87854 | |||||||

Use **estat bootstrap** to report a table with
alternative confidence intervals and an estimate of bias.

Observed Bootstrap | |||||||

Coef. Bias Std. Err. [95% Conf. Interval] | |||||||

_bs_1 | 20 .174 .95845847 18.12146 21.87854 (N) | ||||||

19 22 (P) | |||||||

19 22 (BC) | |||||||

For an example of when we would need to write a program, consider the case of bootstrapping the ratio of two means.

We first define the calculation routine, which we can name whatever we wish,

program myratio, rclass
version 14
summarize length
local length = r(mean)
summarize turn
local turn = r(mean)
return scalar ratio = `length'/`turn'
end

Our program calls **summarize** and stores the mean of the variable
**length** in a local macro. The program then repeats this procedure for
the second variable **turn**. Finally, the ratio of the two means is
computed and returned by our program in the stored result we call
**r(ratio)**.

With our program written, we can now obtain the bootstrap estimate by simply typing

This means that we will execute **bootstrap** with our **myratio**
program for *#* replications. Below we request 1,000 replications and
specify a random-number seed so you can reproduce our results:

Observed Bootstrap Normal-based | ||||||||

Coef. Std. Err. z P>|z| [95% Conf. Interval] | ||||||||

_bs_1 | 4.739945 .0330492 143.42 0.000 4.67517 4.804721 | |||||||

The ratio, calculated over the original sample, is 4.739945; the bootstrap
estimate of the standard error of the ratio is 0.0344786. Had we wanted to
keep the 1,000-observation dataset of bootstrapped results for subsequent
analysis, we would have typed

**bootstrap** can be used with any Stata estimator or calculation command
and even with user-written calculation commands.

We have found **bootstrap** particularly useful in obtaining estimates of
the standard errors of quantile-regression coefficients. Stata performs
quantile regression and obtains the
standard errors using the method suggested by Koenker and Bassett (1978,
1982). Rogers (1992) reports that these standard errors are satisfactory in
the homoskedastic case but that they appear to be understated in the
presence of heteroskedastic errors. One alternative is to bootstrap the
estimated coefficients to obtain the standard errors. For instance, say that
you wish to estimate a median regression of **price** on variables
**weight**, **length**, and **foreign**. Typing **qreg price
weight length foreign** will produce the estimates along with
Koenker–Bassett standard errors. To obtain bootstrap standard errors,
we could issue the command

We recommend this procedure so highly that Gould (1992) wrote a new command
in Stata’s programming language to further automate this procedure for
quantile regression. Typing **bsqreg price weight length foreign** will
also produce the bootstrapped results.

- Efron, B. 1979.
- Bootstrap methods: another look at the jackknife.
*Annals of Statistics*7: 1–26.

- ------. 1982.
*The Jackknife, the Bootstrap and Other Resampling Plans*. Philadelphia: Society for Industrial and Applied Mathematics.

- Efron, B. and R. J. Tibshirani. 1993.
*An Introduction to the Bootstrap*. New York: Chapman & Hall.

- Gould, W. 1992.
- sg11.1: Quantile regression with bootstrapped standard errors.
*Stata Technical Bulletin*9: 19–21. Reprinted in*Stata Technical Bulletic Reprints*, vol. 2, pp. 137–139.

- Koenker, R., and G. Bassett, Jr. 1978.
- Asymptotic theory of least absolute error regression.
*Journal of the American Statistical Association*73: 618–622.

- ------. 1982.
- Robust tests for heteroskedasticity based on regression quantiles.
*Econometrica*50: 43–61.

- Mooney, C. Z., and R. D. Duval. 1993.
*Bootstrapping: A Nonparametric Approach to Statistical Inference*. Newbury Park, CA: Sage.

- Rogers, W. H. 1992.
- sg11: Quantile regression standard errors.
*Stata Technical Bulletin*9: 16–19. Reprinted in*Stata Technical Bulletin Reprints*, vol. 2, pp. 133–137.