**[R] bootstrap** -- Bootstrap sampling and estimation

__Syntax__

**bootstrap** *exp_list* [**,** *options* *eform_option*] **:** *command*

*options* Description
-------------------------------------------------------------------------
Main
__r__**eps(***#***)** perform *#* bootstrap replications; default is
**reps(50)**

Options
__str__**ata(***varlist***)** variables identifying strata
__si__**ze(***#***)** draw samples of size *#*; default is _N
__cl__**uster(***varlist***)** variables identifying resampling clusters
__id__**cluster(***newvar***)** create new cluster ID variable
__sa__**ving(***filename***, ...)** save results to *filename*; save statistics in
double precision; save results to *filename*
every *#* replications
**bca** compute acceleration for BCa confidence
intervals
__tie__**s** adjust BC/BCa confidence intervals for ties
**mse** use MSE formula for variance estimation

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
**notable** suppress table of results
__noh__**eader** suppress table header
__nol__**egend** suppress table legend
__v__**erbose** display the full table legend
**nodots** suppress replication dots
**dots(***#***)** display dots every *#* replications
__noi__**sily** display any output from *command*
__tr__**ace** trace *command*
__ti__**tle(***text***)** use *text* as title for bootstrap results
*display_options* control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
factor-variable labeling
*eform_option* display coefficient table in exponentiated form

Advanced
**nodrop** do not drop observations
**nowarn** do not warn when **e(sample)** is not set
**force** do not check for *weights* or **svy** commands;
seldom used
**reject(***exp***)** identify invalid results
**seed(***#***)** set random-number seed to *#*

**group(***varname***)** ID variable for groups within **cluster()**
__jack__**knifeopts(***jkopts***)** options for **jackknife**
__coefl__**egend** display legend instead of statistics
-------------------------------------------------------------------------
*command* is any command that follows standard Stata syntax. *weights* are
not allowed in *command*.
**group()**, **jackknifeopts()**, and **coeflegend** do not appear in the dialog box.
See **[R] bootstrap postestimation** for features available after estimation.

__Menu__

**Statistics > Resampling > Bootstrap estimation**

__Description__

**bootstrap** performs nonparametric bootstrap estimation of specified
statistics (or expressions) for a Stata command or a user-written
program. Statistics are bootstrapped by resampling the data in memory
with replacement. **bootstrap** is designed for use with nonestimation
commands, functions of coefficients, or user-written programs. To
bootstrap coefficients, we recommend using the **vce(bootstrap)** option when
allowed by the estimation command.

**bs** and **bstrap** are synonyms for **bootstrap**.

__Options__

+------+
----+ Main +-------------------------------------------------------------

**reps(***#***)** specifies the number of bootstrap replications to be performed.
The default is 50. A total of 50-200 replications are generally
adequate for estimates of standard error and thus are adequate for
normal-approximation confidence intervals; see Mooney and Duval
(1993, 11). Estimates of confidence intervals using the percentile
or bias-corrected methods typically require 1,000 or more
replications.

+---------+
----+ Options +----------------------------------------------------------

**strata(***varlist***)** specifies the variables that identify strata. If this
option is specified, bootstrap samples are taken independently within
each stratum.

**size(***#***)** specifies the size of the samples to be drawn. The default is
**_N**, meaning to draw samples of the same size as the data. If
specified, *#* must be less than or equal to the number of observations
within **strata()**.

If **cluster()** is specified, the default size is the number of clusters
in the original dataset. For unbalanced clusters, resulting sample
sizes will differ from replication to replication. For cluster
sampling, *#* must be less than or equal to the number of clusters
within **strata()**.

**cluster(***varlist***)** specifies the variables that identify resampling
clusters. If this option is specified, the sample drawn during each
replication is a bootstrap sample of clusters.

**idcluster(***newvar***)** creates a new variable containing a unique identifier
for each resampled cluster. This option requires that **cluster()** also
be specified.

**saving(***filename* [**,** *suboptions*]**)** creates a Stata data file (**.dta** file)
consisting of (for each statistic in *exp_list*) a variable containing
the replicates.

See prefix_saving_option for details about *suboptions*.

**bca** specifies that **bootstrap** estimate the acceleration of each statistic
in *exp_list*. This estimate is used to construct BCa confidence
intervals. Type **estat bootstrap, bca** to display the BCa confidence
interval generated by the **bootstrap** command.

**ties** specifies that **bootstrap** adjust for ties in the replicate values
when computing the median bias used to construct BC and BCa
confidence intervals.

**mse** specifies that **bootstrap** compute the variance by using deviations of
the replicates from the observed value of the statistics based on the
entire dataset. By default, **bootstrap** computes the variance by using
deviations from the average of the replicates.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**.

**notable** suppresses the display of the table of results.

**noheader** suppresses the display of the table header. This option implies
**nolegend**. This option may also be specified when replaying
estimation results.

**nolegend** suppresses the display of the table legend. This option may
also be specified when replaying estimation results.

**verbose** specifies that the full table legend be displayed. By default,
coefficients and standard errors are not displayed. This option may
also be specified when replaying estimation results.

**nodots** suppresses display of the replication dots. By default, one dot
character is displayed for each successful replication. A red 'x' is
displayed if *command* returns an error or if one of the values in
*exp_list* is missing.

**dots(***#***)** displays dots every *#* replications. **dots(0)** is a synonym for
**nodots**.

**noisily** specifies that any output from *command* be displayed. This option
implies the **nodots** option.

**trace** causes a trace of the execution of *command* to be displayed. This
option implies the **noisily** option.

**title(***text***)** specifies a title to be displayed above the table of
bootstrap results. The default title is the title stored in **e(title)**
by an estimation command, or if **e(title)** is not filled in, **Bootstrap**
**results** is used. **title()** may also be specified when replaying
estimation results.

*display_options*: **noci**, __nopv__**alues**, __noomit__**ted**, **vsquish**, __noempty__**cells**,
__base__**levels**, __allbase__**levels**, __nofvlab__**el**, **fvwrap(***#***)**, **fvwrapon(***style***)**,
**cformat(***%fmt***)**, **pformat(%***fmt***)**, **sformat(%***fmt***)**, and **nolstretch**; see **[R]**
**estimation options**.

*eform_option* causes the coefficient table to be displayed in
exponentiated form; see **[R]** *eform_option*. *command* determines which
*eform_option* is allowed (**eform(***string***)** and **eform** are always allowed).

+----------+
----+ Advanced +---------------------------------------------------------

**nodrop** prevents observations outside **e(sample)** and the **if** and **in**
qualifiers from being dropped before the data are resampled.

**nowarn** suppresses the display of a warning message when *command* does not
set **e(sample)**.

**force** suppresses the restriction that *command* not specify weights or be a
**svy** command. This is a rarely used option. Use it only if you know
what you are doing.

**reject(***exp***)** identifies an expression that indicates when results should
be rejected. When *exp* is true, the resulting values are reset to
missing values.

**seed(***#***)** sets the random-number seed. Specifying this option is
equivalent to typing the following command prior to calling
**bootstrap**:

**. set seed** *#*

The following options are available with **bootstrap** but are not shown in
the dialog box:

**group(***varname***)** re-creates *varname* containing a unique identifier for each
group across the resampled clusters. This option requires that
**idcluster()** also be specified.

This option is useful for maintaining unique group identifiers when
sampling clusters with replacement. Suppose that cluster 1 contains
3 groups. If the **idcluster(newclid)** option is specified and cluster
1 is sampled multiple times, **newclid** uniquely identifies each copy of
cluster 1. If **group(newgroupid)** is also specified, **newgroupid**
uniquely identifies each copy of each group.

**jackknifeopts(***jkopts***)** identifies options that are to be passed to
**jackknife** when it computes the acceleration values for the BCa
confidence intervals. This option requires the **bca** option and is
mostly used for passing the **eclass**, **rclass**, or **n(***#***)** option to
**jackknife**.

**coeflegend**; see **[R] estimation options**.

__Remarks__

Typing

**. bootstrap** *exp_list***,** **reps(***#***):** *command*

executes *command* multiple times, bootstrapping the statistics in *exp_list*
by resampling observations (with replacement) from the data in memory *#*
times. This method is commonly referred to as the nonparametric
bootstrap.

*command* defines the statistical command to be executed. Most Stata
commands and user-written programs can be used with **bootstrap**, as long as
they follow standard Stata syntax; see **[U] 11 Language syntax**. If the
**bca** option is supplied, *command* must also work with **jackknife**; see **[R]**
**jackknife**. The **by** prefix may not be part of *command*.

*exp_list* specifies the statistics to be collected from the execution of
*command*. If *command* changes the contents in **e(b)**, *exp_list* is optional
and defaults to **_b**.

Because bootstrapping is a random process, if you want to be able to
reproduce results, set the random-number seed by specifying the **seed(***#***)**
option or by typing

**. set seed** *#*

where *#* is a seed of your choosing, before running **bootstrap**; see **[R] set**
**seed**.

Many estimation commands allow the **vce(bootstrap)** option. For those
commands, we recommend using **vce(bootstrap)** over **bootstrap** because the
estimation command already handles clustering and other model-specific
details for you. The **bootstrap** prefix command is intended for use with
nonestimation commands, such as **summarize**, user-written programs, or
functions of coefficients.

__Examples__

Setup
**. sysuse auto**

Compute bootstrap estimates
**. bootstrap: regress mpg weight gear foreign**

Same as above command
**. bootstrap _b: regress mpg weight gear foreign**

Change number of replications to 100
**. bootstrap, reps(100): regress mpg weight gear foreign**

Compute acceleration to obtain BCa confidence intervals
**. bootstrap, bca: regress mpg weight gear foreign**

Save results to **bsauto** file
**. bootstrap, saving(bsauto): regress mpg weight gear foreign**

Run **bootstrap** on difference in coefficients of **weight** and **gear**
**. bootstrap diff=(_b[weight]-_b[gear]): regress mpg weight gear**
**foreign**

**bootstrap** *t* statistic using 1000 replications, stratifying on **foreign**,
and saving results in **bsauto** file
**. bootstrap t=r(t), rep(1000) strata(foreign)** **saving(bsauto,**
**replace): ttest mpg, by(foreign) unequal**

__Stored results__

**bootstrap** stores the following in **e()**:

Scalars
**e(N)** sample size
**e(N_reps)** number of complete replications
**e(N_misreps)** number of incomplete replications
**e(N_strata)** number of strata
**e(N_clust)** number of clusters
**e(k_eq)** number of equations in **e(b)**
**e(k_exp)** number of standard expressions
**e(k_eexp)** number of extended expressions (i.e., **_b**)
**e(k_extra)** number of extra equations beyond the original
ones from **e(b)**
**e(level)** confidence level for bootstrap CIs
**e(bs_version)** version for **bootstrap** results
**e(rank)** rank of **e(V)**

Macros
**e(cmdname)** command name from *command*
**e(cmd)** same as **e(cmdname)** or **bootstrap**
**e(command)** *command*
**e(cmdline)** command as typed
**e(prefix)** **bootstrap**
**e(title)** title in estimation output
**e(strata)** strata variables
**e(cluster)** cluster variables
**e(rngstate)** random-number state used
**e(size)** from the **size(***#***)** option
**e(exp***#***)** expression for the *#*th statistic
**e(ties)** **ties**, if specified
**e(mse)** **mse**, if specified
**e(vce)** **bootstrap**
**e(vcetype)** title used to label Std. Err.
**e(properties)** **b V**

Matrices
**e(b)** observed statistics
**e(b_bs)** bootstrap estimates
**e(reps)** number of nonmissing results
**e(bias)** estimated biases
**e(se)** estimated standard errors
**e(z0)** median biases
**e(accel)** estimated accelerations
**e(ci_normal)** normal-approximation CIs
**e(ci_percentile)** percentile CIs
**e(ci_bc)** bias-corrected CIs
**e(ci_bca)** bias-corrected and accelerated CIs
**e(V)** bootstrap variance-covariance matrix
**e(V_modelbased)** model-based variance

When *exp_list* is **_b, bootstrap** will also carry forward most of the
results already in **e()** from *command*.

__Reference__

Mooney, C. Z., and R. D. Duval. 1993. *Bootstrapping: A Nonparametric*
*Approach to Statistical Inference*. Newbury Park, CA: Sage.