<- See Stata 18's new features
Wild cluster bootstrap p-values and confidence intervals for hypothesis tests about parameters from linear regression models
Support for areg, regress, and xtreg, fe
Support for Rademacher, Mammen, Webb, gamma, and normal distributions for the error weights
Support for symmetric and equal-tailed p-value criteria
Do your data have a small number of clusters or an uneven number of observations per cluster? Do you want to make inferences about parameters in a linear model? With the new wildbootstrap command, you can now use wild cluster bootstrap (WCB) in these situations.
The WCB proposed by Cameron, Gelbach, and Miller (2008), provides an alternative to the cluster–robust variance estimator when you have either a small number of clusters or an uneven number of observations across clusters.
When we fit models with clustered observations, we often use a cluster–robust variance estimator, which relaxes the independence assumption for observations within each cluster. This estimator works well if we have many clusters and if the clusters do not differ too much in their numbers of observations. However, if this is not the case, we may obtain better estimates using the WCB.
Stata's new wildbootstrap command estimates WCB p-values and confidence intervals (CIs) for tests of simple and composite linear hypotheses about parameters from linear regression models. These statistics can be obtained when fitting linear regression models such as those fit with regress, models with a large indicator-variable set such as those fit with areg, and fixed-effects models such as those fit with xtreg, fe.
We would like to see the effect of tenure on wages and to account for clusters at the industry level. Here we use a wage dataset from 1988 with only 12 clusters with substantially varying cluster sizes, from 4 to 817, deviating from the assumptions required for the cluster–robust variance estimator to be reliable. We fit a linear regression and compute WCB statistics for a test that the coefficient on tenure is zero. We set the seed using rseed() for reproducibility.
. webuse nlsw88 (NLSW, 1988 extract) . wildbootstrap regress wage tenure, cluster(industry) rseed(12345) Performing 1,000 replications for p-value for tenure = 0 ... Computing confidence interval for tenure Lower bound: .........10.........20...... done (26) Upper bound: .........10.........20.... done (24) Wild cluster bootstrap Number of obs = 2,217 Linear regression Number of clusters = 12 Cluster size: Cluster variable: industry min = 4 Error weight: Rademacher avg = 184.8 max = 817
|wage||Estimate t p-value [95% conf. interval]|
|tenure = 0||.1830716 6.95 0.000 .1274023 .3258156|
The estimated coefficient on tenure is 0.183. The equal-tailed p-value for the test that the coefficient equals zero is less than 0.001; the confidence interval is [0.127, 0.326].
Here we used the default Rademacher weights used for the sampling algorithm of the wild bootstrap. Mammen, Webb, gamma, and normal weights are also available.
While this example is simple, wildbootstrap is quite flexible. You can fit models with many covariates; compute WCB statistics for some or all of them. You can even specify a hypothesis involving multiple coefficients. If, for instance, you wish to test that coefficients on x1 and x2 are equal, add the test(x1=x2) option to your wildbootstrap command.
Cameron, C. A., J.B. Gelbach, and D.L. Miller. 2008. Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics 90: 417–427.
Read more about wild cluster bootstrap and the supported error-weight distributions in the Stata Base Reference Manual; see [R] wildbootstrap ->
Learn about other new features in Stata 18 for robust inference. ->
View all the new features in Stata 18 ->