Which references should I cite when using the vce(cluster clustvar) option to obtain
Stata’s cluster-correlated robust estimate of variance?
|
Title
|
|
Citing references for Stata’s cluster-correlated robust variance estimates
|
|
Author
|
Roberto Gutierrez, StataCorp
David M. Drukker, StataCorp
|
|
Date
|
February 2003; updated May 2005; minor revisions August 2007
|
Question
In performing my statistical analysis, I have used Stata’s
_____ estimation command with the
vce(cluster clustvar)
option to obtain a robust variance estimate that adjusts for
within-cluster correlation. A journal referee now asks that I give the
appropriate reference for this calculation. Which references should I cite?
Short answer
- Rogers, W. H. 1993.
- Regression standard errors in clustered samples.
Stata Technical
Bulletin 13: 19–23. Reprinted in
Stata Technical Bulletin Reprints, vol. 3, 88–94.
-
(A PDF of this article can be found
here.)
- Williams, R. L. 2000.
- A note on robust variance estimation for cluster-correlated data.
Biometrics 56: 645–646.
- Wooldridge, J. M. 2002.
- Econometric Analysis of Cross Section
and Panel Data. Cambridge, MA: MIT Press.
- Froot, K. A. 1989.
- Consistent covariance matrix estimation with cross-sectional dependence and heteroskedasticity in financial data. Journal of Financial and Quantitative Analysis 24: 333–355.
Long answer
Most of Stata’s estimation commands provide the
vce(robust) option. By specifying
vce(robust), one may forgo model-based variance
estimates in favor of the more model-agnostic “robust”
variances. Robust variances give accurate assessments of the
sample-to-sample variability of the parameter estimates even when the model
is misspecified. The robust variance comes under various names and within
Stata is known as the Huber/White/sandwich estimate of variance. The names
Huber and White refer to the seminal references for this estimator:
- Huber, P. J. 1967.
- The behavior of maximum likelihood estimates under nonstandard
conditions. In Proceedings of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability. Berkeley, CA: University of
California Press, vol. 1, 221–233.
- White, H. 1980.
- A heteroskedasticity-consistent covariance matrix estimator and a
direct test for heteroskedasticity. Econometrica 48: 817–830.
The name “sandwich” refers to the mathematical form of the
estimate, namely, that it is calculated as the product of three matrices:
the matrix formed by taking the outer product of the observation-level
likelihood/pseudolikelihood score vectors is used as the middle of these
matrices (the meat of the sandwich), and this matrix is in turn pre- and
postmultiplied by the usual model-based variance matrix (the bread of the
sandwich).
Huber (1967) and White (1980), however, do not deal with clustering. When
you have clustering, the observations within cluster may not be treated as
independent, but the clusters themselves are independent. Here, the robust
calculation is straightforwardly generalized by substituting the meat of the
sandwich with a matrix formed by taking the outer product of the
cluster-level scores, where within each cluster the cluster-level
score is obtained by summing the observation-level scores. See Rogers
(1993) and [P] _robust
for details.
This generalization for clustering is, in fact, so
“straightforward” that it has for a long time (until Froot
[1989]) remained undocumented in the literature. In fact, Williams (2000)
is simply a short note that comments on this fact and gives a short proof of
the validity of the estimator:
This brief note presents a general proof that the [modified-sandwich]
estimator is unbiased for cluster-correlated data regardless of the setting.
The result is not new, but a simple and general reference is not readily
available.
The above hints that Froot (1989) may be little known outside the
econometrics community and Rogers (1993) is little known among non-Stata
users. Those requiring a reference from a refereed journal can therefore
cite Froot (1989) as the seminal reference or Williams (2000) for its direct
statement of this result. Those wanting a reference for how the calculation
is actually performed in Stata can use Rogers (1993). Also, those wanting a
textbook proof can cite Wooldridge (2002, sec. 13.8.2).
Finally, although White did not explicitly consider cluster sampling, he did
address the finitely correlated case in his 1984 and 1994 books. The
results for cluster analysis can also be derived from the results in section
8.3 of White (1994).
More references
-
White, H. 1984.
- Asymptotic Theory for Econometricians. Orlando, FL: Academic Press.
-
White, H. 1994.
- Estimation, Inference and
Specification Analysis. New York: Cambridge University Press.
|
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
|