Bootstrap sampling and estimation
- Bootstrap of Stata commands
- Boostrap of user-written programs
- Standard errors and bias estimation
Stata’s programmability makes performing bootstrap sampling and
estimation possible (see Efron 1979, 1982; Efron and Tibshirani 1993; Mooney
and Duval 1993). We provide two commands to simplify bootstrap estimation.
bsample draws a
sample with replacement from a dataset. bsample may be used in
user-written programs.
It is easier, however, to perform bootstrap estimation using the
bootstrap
prefix command. bootstrap allows the user to supply an expression
that is a function of the saved results of existing commands, or you can
write a program to calculate the statistics of interest. bootstrap
then can repeatedly draw a sample with replacement, run the user-written
program, collect the results into a new dataset, and present the results.
The user-written calculation program is easy to write because every Stata
command saves the statistics it calculates.
For instance, assume that we wish to obtain the bootstrap estimate of the
standard error of the median of a variable called mpg. Stata has a
built-in command,
summarize, that
calculates and displays summary statistics; it calculates means, standard
deviations, skewness, kurtosis, and various percentiles. Among those
percentiles is the 50th percentile—the median. In addition to
displaying the calculated results, summarize saves them, and looking
in the manual, we discover that the median is saved in r(p50). To get
a bootstrap estimate of its standard error, all we need to do is type
. bootstrap r(p50), reps(1000): summarize mpg, detail
and bootstrap will do all of the work for us. We'll also specify a
seed() option so that you can reproduce our results.
. webuse auto
(1978 Automobile Data)
. bootstrap r(p50), reps(1000) seed(1234): summarize mpg, detail
(running summarize on estimation sample)
(output omitted)
Bootstrap results Number of obs = 74
Replications = 1000
command: summarize mpg, detail
_bs_1: r(p50)
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 | 20 .963566 20.76 0.000 18.11145 21.88855
------------------------------------------------------------------------------
Use the estat bootstrap postestimation command to report a table with
alternative confidence intervals and an estimate of bias.
. estat bootstrap, all
Bootstrap results Number of obs = 74
Replications = 1000
command: summarize mpg, detail
_bs_1: r(p50)
------------------------------------------------------------------------------
| Observed Bootstrap
| Coef. Bias Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 | 20 .187 .96356601 18.11145 21.88855 (N)
| 19 22 (P)
| 19 22 (BC)
------------------------------------------------------------------------------
(N) normal confidence interval
(P) percentile confidence interval
(BC) bias-corrected confidence interval
For an example of when we would need to write a program, consider the case
of bootstrapping the ratio of two means.
We first define the calculation routine, which we can name whatever we wish,
program myratio, rclass
version 12
summarize length
local length = r(mean)
summarize turn
local turn = r(mean)
return scalar ratio = `length'/`turn'
end
Our program calls summarize and stores the mean of the variable
length in a local macro. The program then repeats this procedure for
the second variable turn. Finally, the ratio of the two means is
computed and returned by our program in the saved result we call
r(ratio).
With our program written, we can now obtain the bootstrap estimate by simply
typing
. bootstrap r(ratio), reps(#): myratio
This means that we will execute bootstrap with our myratio
program for # replications. Below we request 1,000 replications and
specify a random-number seed so you can reproduce our results:
. bootstrap r(ratio), reps(1000) seed(4567): myratio
(running myratio on estimation sample)
(output omitted)
Bootstrap results Number of obs = 74
Replications = 1000
command: myratio
_bs_1: r(ratio)
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 | 4.739945 .0344786 137.47 0.000 4.672369 4.807522
------------------------------------------------------------------------------
The ratio, calculated over the original sample, is 4.739945; the bootstrap
estimate of the standard error of the ratio is 0.0344786. Had we wanted to
keep the 1,000-observation dataset of bootstrapped results for subsequent
analysis, we would have typed
. bootstrap r(ratio), reps(1000) seed(4567) saving(mydata): myratio
bootstrap can be used with any Stata estimator or calculation command
and even with user-written calculation commands.
We have found bootstrap particularly useful in obtaining estimates of
the standard errors of quantile-regression coefficients. Stata performs
quantile
regression and obtains the standard errors using the method suggested by
Koenker and Bassett (1978, 1982). Rogers (1992) reports that these standard
errors are satisfactory in the homoskedastic case but that they appear to be
understated in the presence of heteroskedastic errors. One alternative is to
bootstrap the estimated coefficients to obtain the standard errors. For
instance, say that you wish to estimate a median regression of price
on variables weight, length, and foreign. Typing
qreg price weight length foreign will produce the estimates along
with Koenker–Bassett standard errors. To obtain bootstrap standard
errors, we could issue the command
. bootstrap, reps(#): qreg price weight length foreign
We recommend this procedure so highly that Gould (1992) wrote a command
in Stata’s programming language to further automate this procedure for
quantile regression. Typing bsqreg price weight length foreign will
also produce the bootstrapped results.
See
New in Stata 12
for more about what was added in Stata Release 12.
References
- Efron, B. 1979.
- Bootstrap methods: another look at the jackknife. Annals of
Statistics 7: 1–26.
- ------. 1982.
- The Jackknife, the Bootstrap and Other Resampling Plans.
Philadelphia: Society for Industrial and Applied Mathematics.
- Efron, B. and R. J. Tibshirani. 1993.
- An Introduction to
the Bootstrap. New York: Chapman & Hall.
- Gould, W. 1992.
- sg11.1: Quantile regression with bootstrapped standard errors.
Stata Technical Bulletin 9: 19–21. Reprinted in
Stata Technical Bulletic Reprints, vol. 2, pp. 137–139.
- Koenker, R., and G. Bassett, Jr. 1978.
- Asymptotic theory of least absolute error regression. Journal of the
American Statistical Association 73: 618–622.
- ------. 1982.
- Robust tests for heteroskedasticity based on regression quantiles.
Econometrica 50: 43–61.
- Mooney, C. Z., and R. D. Duval. 1993.
- Bootstrapping: A
Nonparametric Approach to Statistical Inference. Newbury Park, CA:
Sage.
- Rogers, W. H. 1992.
- sg11: Quantile regression standard errors.
Stata Technical Bulletin 9: 16–19. Reprinted in
Stata Technical Bulletin Reprints, vol. 2, pp. 133–137.
|