Stata | FAQ: Citing references for Stata's cluster-correlated robust variance estimates

Home / Resources & support / FAQs / Reference for cluster-correlated robust variance calculation

Which references should I cite when using the vce(cluster clustvar) option to obtain Stata’s cluster-correlated robust estimate of variance?

Title		Citing references for Stata’s cluster-correlated robust variance estimates
Author		Roberto Gutierrez, StataCorp David M. Drukker, StataCorp

Question

In performing my statistical analysis, I have used Stata’s _____ estimation command with the vce(cluster clustvar) option to obtain a robust variance estimate that adjusts for within-cluster correlation. A journal referee now asks that I give the appropriate reference for this calculation. Which references should I cite?

Short answer

Rogers, W. H. 1993.: Regression standard errors in clustered samples. Stata Technical Bulletin 13: 19–23. Reprinted in Stata Technical Bulletin Reprints, vol. 3, 88–94.; (A PDF of this article can be found here.)

Williams, R. L. 2000.: A note on robust variance estimation for cluster-correlated data. Biometrics 56: 645–646.

Wooldridge, J. M. 2002.: Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.

Froot, K. A. 1989.: Consistent covariance matrix estimation with cross-sectional dependence and heteroskedasticity in financial data. Journal of Financial and Quantitative Analysis 24: 333–355.

Long answer

Most of Stata’s estimation commands provide the vce(robust) option. By specifying vce(robust), one may forgo model-based variance estimates in favor of the more model-agnostic “robust” variances. Robust variances give accurate assessments of the sample-to-sample variability of the parameter estimates even when the model is misspecified. The robust variance comes under various names and within Stata is known as the Huber/White/sandwich estimate of variance. The names Huber and White refer to the seminal references for this estimator:

Huber, P. J. 1967.: The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, CA: University of California Press, vol. 1, 221–233.
White, H. 1980.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48: 817–830.

The name “sandwich” refers to the mathematical form of the estimate, namely, that it is calculated as the product of three matrices: the matrix formed by taking the outer product of the observation-level likelihood/pseudolikelihood score vectors is used as the middle of these matrices (the meat of the sandwich), and this matrix is in turn pre- and postmultiplied by the usual model-based variance matrix (the bread of the sandwich).

Huber (1967) and White (1980), however, do not deal with clustering. When you have clustering, the observations within cluster may not be treated as independent, but the clusters themselves are independent. Here, the robust calculation is straightforwardly generalized by substituting the meat of the sandwich with a matrix formed by taking the outer product of the cluster-level scores, where within each cluster the cluster-level score is obtained by summing the observation-level scores. See Rogers (1993) and [P] _robust for details.

This generalization for clustering is, in fact, so “straightforward” that it has for a long time (until Froot [1989]) remained undocumented in the literature. In fact, Williams (2000) is simply a short note that comments on this fact and gives a short proof of the validity of the estimator:

This brief note presents a general proof that the [modified-sandwich] estimator is unbiased for cluster-correlated data regardless of the setting. The result is not new, but a simple and general reference is not readily available.

The above hints that Froot (1989) may be little known outside the econometrics community and Rogers (1993) is little known among non-Stata users. Those requiring a reference from a refereed journal can therefore cite Froot (1989) as the seminal reference or Williams (2000) for its direct statement of this result. Those wanting a reference for how the calculation is actually performed in Stata can use Rogers (1993). Also, those wanting a textbook proof can cite Wooldridge (2002, sec. 13.8.2).

Finally, although White did not explicitly consider cluster sampling, he did address the finitely correlated case in his 1984 and 1994 books. The results for cluster analysis can also be derived from the results in section 8.3 of White (1994).

More references

White, H. 1984.: Asymptotic Theory for Econometricians. Orlando, FL: Academic Press.

White, H. 1994.: Estimation, Inference and Specification Analysis. New York: Cambridge University Press.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Which references should I cite when using the vce(cluster clustvar) option to obtain Stata’s cluster-correlated robust estimate of variance?

Question

Short answer

Long answer

More references

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Which references should I cite when using the vce(cluster clustvar) option to obtain Stata’s cluster-correlated robust estimate of variance?

Question

Short answer

Long answer

More references

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies