[R] bootstrap -- Bootstrap sampling and estimation
bootstrap exp_list [, options eform_option] : command
reps(#) perform # bootstrap replications; default is
strata(varlist) variables identifying strata
size(#) draw samples of size #; default is _N
cluster(varlist) variables identifying resampling clusters
idcluster(newvar) create new cluster ID variable
saving(filename, ...) save results to filename; save statistics in
double precision; save results to filename
every # replications
bca compute acceleration for BCa confidence
ties adjust BC/BCa confidence intervals for ties
mse use MSE formula for variance estimation
level(#) set confidence level; default is level(95)
notable suppress table of results
noheader suppress table header
nolegend suppress table legend
verbose display the full table legend
nodots suppress replication dots
dots(#) display dots every # replications
noisily display any output from command
trace trace command
title(text) use text as title for bootstrap results
display_options control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
eform_option display coefficient table in exponentiated form
nodrop do not drop observations
nowarn do not warn when e(sample) is not set
force do not check for weights or svy commands;
reject(exp) identify invalid results
seed(#) set random-number seed to #
group(varname) ID variable for groups within cluster()
jackknifeopts(jkopts) options for jackknife
coeflegend display legend instead of statistics
command is any command that follows standard Stata syntax. weights are
not allowed in command.
group(), jackknifeopts(), and coeflegend do not appear in the dialog box.
See [R] bootstrap postestimation for features available after estimation.
Statistics > Resampling > Bootstrap estimation
bootstrap performs nonparametric bootstrap estimation of specified
statistics (or expressions) for a Stata command or a user-written
program. Statistics are bootstrapped by resampling the data in memory
with replacement. bootstrap is designed for use with nonestimation
commands, functions of coefficients, or user-written programs. To
bootstrap coefficients, we recommend using the vce(bootstrap) option when
allowed by the estimation command.
bs and bstrap are synonyms for bootstrap.
----+ Main +-------------------------------------------------------------
reps(#) specifies the number of bootstrap replications to be performed.
The default is 50. A total of 50-200 replications are generally
adequate for estimates of standard error and thus are adequate for
normal-approximation confidence intervals; see Mooney and Duval
(1993, 11). Estimates of confidence intervals using the percentile
or bias-corrected methods typically require 1,000 or more
----+ Options +----------------------------------------------------------
strata(varlist) specifies the variables that identify strata. If this
option is specified, bootstrap samples are taken independently within
size(#) specifies the size of the samples to be drawn. The default is
_N, meaning to draw samples of the same size as the data. If
specified, # must be less than or equal to the number of observations
If cluster() is specified, the default size is the number of clusters
in the original dataset. For unbalanced clusters, resulting sample
sizes will differ from replication to replication. For cluster
sampling, # must be less than or equal to the number of clusters
cluster(varlist) specifies the variables that identify resampling
clusters. If this option is specified, the sample drawn during each
replication is a bootstrap sample of clusters.
idcluster(newvar) creates a new variable containing a unique identifier
for each resampled cluster. This option requires that cluster() also
saving(filename [, suboptions]) creates a Stata data file (.dta file)
consisting of (for each statistic in exp_list) a variable containing
See prefix_saving_option for details about suboptions.
bca specifies that bootstrap estimate the acceleration of each statistic
in exp_list. This estimate is used to construct BCa confidence
intervals. Type estat bootstrap, bca to display the BCa confidence
interval generated by the bootstrap command.
ties specifies that bootstrap adjust for ties in the replicate values
when computing the median bias used to construct BC and BCa
mse specifies that bootstrap compute the variance by using deviations of
the replicates from the observed value of the statistics based on the
entire dataset. By default, bootstrap computes the variance by using
deviations from the average of the replicates.
----+ Reporting +--------------------------------------------------------
level(#); see [R] estimation options.
notable suppresses the display of the table of results.
noheader suppresses the display of the table header. This option implies
nolegend. This option may also be specified when replaying
nolegend suppresses the display of the table legend. This option may
also be specified when replaying estimation results.
verbose specifies that the full table legend be displayed. By default,
coefficients and standard errors are not displayed. This option may
also be specified when replaying estimation results.
nodots suppresses display of the replication dots. By default, one dot
character is displayed for each successful replication. A red 'x' is
displayed if command returns an error or if one of the values in
exp_list is missing.
dots(#) displays dots every # replications. dots(0) is a synonym for
noisily specifies that any output from command be displayed. This option
implies the nodots option.
trace causes a trace of the execution of command to be displayed. This
option implies the noisily option.
title(text) specifies a title to be displayed above the table of
bootstrap results. The default title is the title stored in e(title)
by an estimation command, or if e(title) is not filled in, Bootstrap
results is used. title() may also be specified when replaying
display_options: noci, nopvalues, noomitted, vsquish, noemptycells,
baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style),
cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R]
eform_option causes the coefficient table to be displayed in
exponentiated form; see [R] eform_option. command determines which
eform_option is allowed (eform(string) and eform are always allowed).
----+ Advanced +---------------------------------------------------------
nodrop prevents observations outside e(sample) and the if and in
qualifiers from being dropped before the data are resampled.
nowarn suppresses the display of a warning message when command does not
force suppresses the restriction that command not specify weights or be a
svy command. This is a rarely used option. Use it only if you know
what you are doing.
reject(exp) identifies an expression that indicates when results should
be rejected. When exp is true, the resulting values are reset to
seed(#) sets the random-number seed. Specifying this option is
equivalent to typing the following command prior to calling
. set seed #
The following options are available with bootstrap but are not shown in
the dialog box:
group(varname) re-creates varname containing a unique identifier for each
group across the resampled clusters. This option requires that
idcluster() also be specified.
This option is useful for maintaining unique group identifiers when
sampling clusters with replacement. Suppose that cluster 1 contains
3 groups. If the idcluster(newclid) option is specified and cluster
1 is sampled multiple times, newclid uniquely identifies each copy of
cluster 1. If group(newgroupid) is also specified, newgroupid
uniquely identifies each copy of each group.
jackknifeopts(jkopts) identifies options that are to be passed to
jackknife when it computes the acceleration values for the BCa
confidence intervals. This option requires the bca option and is
mostly used for passing the eclass, rclass, or n(#) option to
coeflegend; see [R] estimation options.
. bootstrap exp_list, reps(#): command
executes command multiple times, bootstrapping the statistics in exp_list
by resampling observations (with replacement) from the data in memory #
times. This method is commonly referred to as the nonparametric
command defines the statistical command to be executed. Most Stata
commands and user-written programs can be used with bootstrap, as long as
they follow standard Stata syntax; see [U] 11 Language syntax. If the
bca option is supplied, command must also work with jackknife; see [R]
jackknife. The by prefix may not be part of command.
exp_list specifies the statistics to be collected from the execution of
command. If command changes the contents in e(b), exp_list is optional
and defaults to _b.
Because bootstrapping is a random process, if you want to be able to
reproduce results, set the random-number seed by specifying the seed(#)
option or by typing
. set seed #
where # is a seed of your choosing, before running bootstrap; see [R] set
Many estimation commands allow the vce(bootstrap) option. For those
commands, we recommend using vce(bootstrap) over bootstrap because the
estimation command already handles clustering and other model-specific
details for you. The bootstrap prefix command is intended for use with
nonestimation commands, such as summarize, user-written programs, or
functions of coefficients.
. sysuse auto
Compute bootstrap estimates
. bootstrap: regress mpg weight gear foreign
Same as above command
. bootstrap _b: regress mpg weight gear foreign
Change number of replications to 100
. bootstrap, reps(100): regress mpg weight gear foreign
Compute acceleration to obtain BCa confidence intervals
. bootstrap, bca: regress mpg weight gear foreign
Save results to bsauto file
. bootstrap, saving(bsauto): regress mpg weight gear foreign
Run bootstrap on difference in coefficients of weight and gear
. bootstrap diff=(_b[weight]-_b[gear]): regress mpg weight gear
bootstrap t statistic using 1000 replications, stratifying on foreign,
and saving results in bsauto file
. bootstrap t=r(t), rep(1000) strata(foreign) saving(bsauto,
replace): ttest mpg, by(foreign) unequal
bootstrap stores the following in e():
e(N) sample size
e(N_reps) number of complete replications
e(N_misreps) number of incomplete replications
e(N_strata) number of strata
e(N_clust) number of clusters
e(k_eq) number of equations in e(b)
e(k_exp) number of standard expressions
e(k_eexp) number of extended expressions (i.e., _b)
e(k_extra) number of extra equations beyond the original
ones from e(b)
e(level) confidence level for bootstrap CIs
e(bs_version) version for bootstrap results
e(rank) rank of e(V)
e(cmdname) command name from command
e(cmd) same as e(cmdname) or bootstrap
e(cmdline) command as typed
e(title) title in estimation output
e(strata) strata variables
e(cluster) cluster variables
e(rngstate) random-number state used
e(size) from the size(#) option
e(exp#) expression for the #th statistic
e(ties) ties, if specified
e(mse) mse, if specified
e(vcetype) title used to label Std. Err.
e(properties) b V
e(b) observed statistics
e(b_bs) bootstrap estimates
e(reps) number of nonmissing results
e(bias) estimated biases
e(se) estimated standard errors
e(z0) median biases
e(accel) estimated accelerations
e(ci_normal) normal-approximation CIs
e(ci_percentile) percentile CIs
e(ci_bc) bias-corrected CIs
e(ci_bca) bias-corrected and accelerated CIs
e(V) bootstrap variance-covariance matrix
e(V_modelbased) model-based variance
When exp_list is _b, bootstrap will also carry forward most of the
results already in e() from command.
Mooney, C. Z., and R. D. Duval. 1993. Bootstrapping: A Nonparametric
Approach to Statistical Inference. Newbury Park, CA: Sage.