Wild cluster bootstrap for linear regression

Wild cluster bootstrap

Order

Watch video demo

<- See Stata 18's new features

Highlights

Wild cluster bootstrap p-values and confidence intervals for hypothesis tests about parameters from linear regression models
Support for areg, regress, and xtreg, fe
Support for Rademacher, Mammen, Webb, gamma, and normal distributions for the error weights
Support for symmetric and equal-tailed p-value criteria
See more features for linear models

Do your data have a small number of clusters or an uneven number of observations per cluster? Do you want to make inferences about parameters in a linear model? With the new wildbootstrap command, you can now use wild cluster bootstrap (WCB) in these situations.

-> Overview

-> Let's see it work

-> Reference

-> Tell me more

Overview

The WCB proposed by Cameron, Gelbach, and Miller (2008), provides an alternative to the cluster–robust variance estimator when you have either a small number of clusters or an uneven number of observations across clusters.

When we fit models with clustered observations, we often use a cluster–robust variance estimator, which relaxes the independence assumption for observations within each cluster. This estimator works well if we have many clusters and if the clusters do not differ too much in their numbers of observations. However, if this is not the case, we may obtain better estimates using the WCB.

Stata's new wildbootstrap command estimates WCB p-values and confidence intervals (CIs) for tests of simple and composite linear hypotheses about parameters from linear regression models. These statistics can be obtained when fitting linear regression models such as those fit with regress, models with a large indicator-variable set such as those fit with areg, and fixed-effects models such as those fit with xtreg, fe.

Let's see it work

We would like to see the effect of tenure on wages and to account for clusters at the industry level. Here we use a wage dataset from 1988 with only 12 clusters with substantially varying cluster sizes, from 4 to 817, deviating from the assumptions required for the cluster–robust variance estimator to be reliable. We fit a linear regression and compute WCB statistics for a test that the coefficient on tenure is zero. We set the seed using rseed() for reproducibility.

. webuse nlsw88
(NLSW, 1988 extract)

. wildbootstrap regress wage tenure, cluster(industry) rseed(12345)

Performing 1,000 replications for p-value for tenure = 0 ...
Computing confidence interval for tenure
  Lower bound: .........10.........20...... done (26)
  Upper bound: .........10.........20.... done (24)

Wild cluster bootstrap  	                   Number of obs      = 2,217
Linear regression               	           Number of clusters =    12
                                        	   Cluster size:
Cluster variable: industry                      	          min =     4
Error weight: Rademacher                                          avg = 184.8
                                                                  max =   817


                    wage     Estimate      t  p-value    [95% conf. interval]

constraint               
              tenure = 0    .1830716    6.95   0.000    .1274023    .3258156

The estimated coefficient on tenure is 0.183. The equal-tailed p-value for the test that the coefficient equals zero is less than 0.001; the confidence interval is [0.127, 0.326].

Here we used the default Rademacher weights used for the sampling algorithm of the wild bootstrap. Mammen, Webb, gamma, and normal weights are also available.

While this example is simple, wildbootstrap is quite flexible. You can fit models with many covariates; compute WCB statistics for some or all of them. You can even specify a hypothesis involving multiple coefficients. If, for instance, you wish to test that coefficients on x1 and x2 are equal, add the test(x1=x2) option to your wildbootstrap command.

Reference

Cameron, C. A., J.B. Gelbach, and D.L. Miller. 2008. Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics 90: 417–427.

Tell me more

Read more about wild cluster bootstrap and the supported error-weight distributions in the Stata Base Reference Manual; see [R] wildbootstrap.

Learn about other new features in Stata 18 for robust inference.

View all the new features in Stata 18 and, in particular, New in linear models.

Made for data science.

Get started today.

Order

Upgrade

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.