Stata 15 help for bootstrap

[R] bootstrap -- Bootstrap sampling and estimation

Syntax

bootstrap exp_list [, options eform_option] : command

options Description ------------------------------------------------------------------------- Main reps(#) perform # bootstrap replications; default is reps(50)

Options strata(varlist) variables identifying strata size(#) draw samples of size #; default is _N cluster(varlist) variables identifying resampling clusters idcluster(newvar) create new cluster ID variable saving(filename, ...) save results to filename; save statistics in double precision; save results to filename every # replications bca compute acceleration for BCa confidence intervals ties adjust BC/BCa confidence intervals for ties mse use MSE formula for variance estimation

Reporting level(#) set confidence level; default is level(95) notable suppress table of results noheader suppress table header nolegend suppress table legend verbose display the full table legend nodots suppress replication dots dots(#) display dots every # replications noisily display any output from command trace trace command title(text) use text as title for bootstrap results display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling eform_option display coefficient table in exponentiated form

Advanced nodrop do not drop observations nowarn do not warn when e(sample) is not set force do not check for weights or svy commands; seldom used reject(exp) identify invalid results seed(#) set random-number seed to #

group(varname) ID variable for groups within cluster() jackknifeopts(jkopts) options for jackknife coeflegend display legend instead of statistics ------------------------------------------------------------------------- command is any command that follows standard Stata syntax. weights are not allowed in command. group(), jackknifeopts(), and coeflegend do not appear in the dialog box. See [R] bootstrap postestimation for features available after estimation.

Menu

Statistics > Resampling > Bootstrap estimation

Description

bootstrap performs nonparametric bootstrap estimation of specified statistics (or expressions) for a Stata command or a user-written program. Statistics are bootstrapped by resampling the data in memory with replacement. bootstrap is designed for use with nonestimation commands, functions of coefficients, or user-written programs. To bootstrap coefficients, we recommend using the vce(bootstrap) option when allowed by the estimation command.

bs and bstrap are synonyms for bootstrap.

Options

+------+ ----+ Main +-------------------------------------------------------------

reps(#) specifies the number of bootstrap replications to be performed. The default is 50. A total of 50-200 replications are generally adequate for estimates of standard error and thus are adequate for normal-approximation confidence intervals; see Mooney and Duval (1993, 11). Estimates of confidence intervals using the percentile or bias-corrected methods typically require 1,000 or more replications.

+---------+ ----+ Options +----------------------------------------------------------

strata(varlist) specifies the variables that identify strata. If this option is specified, bootstrap samples are taken independently within each stratum.

size(#) specifies the size of the samples to be drawn. The default is _N, meaning to draw samples of the same size as the data. If specified, # must be less than or equal to the number of observations within strata().

If cluster() is specified, the default size is the number of clusters in the original dataset. For unbalanced clusters, resulting sample sizes will differ from replication to replication. For cluster sampling, # must be less than or equal to the number of clusters within strata().

cluster(varlist) specifies the variables that identify resampling clusters. If this option is specified, the sample drawn during each replication is a bootstrap sample of clusters.

idcluster(newvar) creates a new variable containing a unique identifier for each resampled cluster. This option requires that cluster() also be specified.

saving(filename [, suboptions]) creates a Stata data file (.dta file) consisting of (for each statistic in exp_list) a variable containing the replicates.

See prefix_saving_option for details about suboptions.

bca specifies that bootstrap estimate the acceleration of each statistic in exp_list. This estimate is used to construct BCa confidence intervals. Type estat bootstrap, bca to display the BCa confidence interval generated by the bootstrap command.

ties specifies that bootstrap adjust for ties in the replicate values when computing the median bias used to construct BC and BCa confidence intervals.

mse specifies that bootstrap compute the variance by using deviations of the replicates from the observed value of the statistics based on the entire dataset. By default, bootstrap computes the variance by using deviations from the average of the replicates.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

notable suppresses the display of the table of results.

noheader suppresses the display of the table header. This option implies nolegend. This option may also be specified when replaying estimation results.

nolegend suppresses the display of the table legend. This option may also be specified when replaying estimation results.

verbose specifies that the full table legend be displayed. By default, coefficients and standard errors are not displayed. This option may also be specified when replaying estimation results.

nodots suppresses display of the replication dots. By default, one dot character is displayed for each successful replication. A red 'x' is displayed if command returns an error or if one of the values in exp_list is missing.

dots(#) displays dots every # replications. dots(0) is a synonym for nodots.

noisily specifies that any output from command be displayed. This option implies the nodots option.

trace causes a trace of the execution of command to be displayed. This option implies the noisily option.

title(text) specifies a title to be displayed above the table of bootstrap results. The default title is the title stored in e(title) by an estimation command, or if e(title) is not filled in, Bootstrap results is used. title() may also be specified when replaying estimation results.

display_options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] estimation options.

eform_option causes the coefficient table to be displayed in exponentiated form; see [R] eform_option. command determines which eform_option is allowed (eform(string) and eform are always allowed).

+----------+ ----+ Advanced +---------------------------------------------------------

nodrop prevents observations outside e(sample) and the if and in qualifiers from being dropped before the data are resampled.

nowarn suppresses the display of a warning message when command does not set e(sample).

force suppresses the restriction that command not specify weights or be a svy command. This is a rarely used option. Use it only if you know what you are doing.

reject(exp) identifies an expression that indicates when results should be rejected. When exp is true, the resulting values are reset to missing values.

seed(#) sets the random-number seed. Specifying this option is equivalent to typing the following command prior to calling bootstrap:

. set seed #

The following options are available with bootstrap but are not shown in the dialog box:

group(varname) re-creates varname containing a unique identifier for each group across the resampled clusters. This option requires that idcluster() also be specified.

This option is useful for maintaining unique group identifiers when sampling clusters with replacement. Suppose that cluster 1 contains 3 groups. If the idcluster(newclid) option is specified and cluster 1 is sampled multiple times, newclid uniquely identifies each copy of cluster 1. If group(newgroupid) is also specified, newgroupid uniquely identifies each copy of each group.

jackknifeopts(jkopts) identifies options that are to be passed to jackknife when it computes the acceleration values for the BCa confidence intervals. This option requires the bca option and is mostly used for passing the eclass, rclass, or n(#) option to jackknife.

coeflegend; see [R] estimation options.

Remarks

Typing

. bootstrap exp_list, reps(#): command

executes command multiple times, bootstrapping the statistics in exp_list by resampling observations (with replacement) from the data in memory # times. This method is commonly referred to as the nonparametric bootstrap.

command defines the statistical command to be executed. Most Stata commands and user-written programs can be used with bootstrap, as long as they follow standard Stata syntax; see [U] 11 Language syntax. If the bca option is supplied, command must also work with jackknife; see [R] jackknife. The by prefix may not be part of command.

exp_list specifies the statistics to be collected from the execution of command. If command changes the contents in e(b), exp_list is optional and defaults to _b.

Because bootstrapping is a random process, if you want to be able to reproduce results, set the random-number seed by specifying the seed(#) option or by typing

. set seed #

where # is a seed of your choosing, before running bootstrap; see [R] set seed.

Many estimation commands allow the vce(bootstrap) option. For those commands, we recommend using vce(bootstrap) over bootstrap because the estimation command already handles clustering and other model-specific details for you. The bootstrap prefix command is intended for use with nonestimation commands, such as summarize, user-written programs, or functions of coefficients.

Examples

Setup . sysuse auto

Compute bootstrap estimates . bootstrap: regress mpg weight gear foreign

Same as above command . bootstrap _b: regress mpg weight gear foreign

Change number of replications to 100 . bootstrap, reps(100): regress mpg weight gear foreign

Compute acceleration to obtain BCa confidence intervals . bootstrap, bca: regress mpg weight gear foreign

Save results to bsauto file . bootstrap, saving(bsauto): regress mpg weight gear foreign

Run bootstrap on difference in coefficients of weight and gear . bootstrap diff=(_b[weight]-_b[gear]): regress mpg weight gear foreign

bootstrap t statistic using 1000 replications, stratifying on foreign, and saving results in bsauto file . bootstrap t=r(t), rep(1000) strata(foreign) saving(bsauto, replace): ttest mpg, by(foreign) unequal

Stored results

bootstrap stores the following in e():

Scalars e(N) sample size e(N_reps) number of complete replications e(N_misreps) number of incomplete replications e(N_strata) number of strata e(N_clust) number of clusters e(k_eq) number of equations in e(b) e(k_exp) number of standard expressions e(k_eexp) number of extended expressions (i.e., _b) e(k_extra) number of extra equations beyond the original ones from e(b) e(level) confidence level for bootstrap CIs e(bs_version) version for bootstrap results e(rank) rank of e(V)

Macros e(cmdname) command name from command e(cmd) same as e(cmdname) or bootstrap e(command) command e(cmdline) command as typed e(prefix) bootstrap e(title) title in estimation output e(strata) strata variables e(cluster) cluster variables e(rngstate) random-number state used e(size) from the size(#) option e(exp#) expression for the #th statistic e(ties) ties, if specified e(mse) mse, if specified e(vce) bootstrap e(vcetype) title used to label Std. Err. e(properties) b V

Matrices e(b) observed statistics e(b_bs) bootstrap estimates e(reps) number of nonmissing results e(bias) estimated biases e(se) estimated standard errors e(z0) median biases e(accel) estimated accelerations e(ci_normal) normal-approximation CIs e(ci_percentile) percentile CIs e(ci_bc) bias-corrected CIs e(ci_bca) bias-corrected and accelerated CIs e(V) bootstrap variance-covariance matrix e(V_modelbased) model-based variance

When exp_list is _b, bootstrap will also carry forward most of the results already in e() from command.

Reference

Mooney, C. Z., and R. D. Duval. 1993. Bootstrapping: A Nonparametric Approach to Statistical Inference. Newbury Park, CA: Sage.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index