Title  Citing references for Stata’s clustercorrelated robust variance estimates  
Author 
Roberto Gutierrez, StataCorp David M. Drukker, StataCorp 

Date  February 2003; updated May 2005; minor revisions August 2007 
In performing my statistical analysis, I have used Stata’s _____ estimation command with the vce(cluster clustvar) option to obtain a robust variance estimate that adjusts for withincluster correlation. A journal referee now asks that I give the appropriate reference for this calculation. Which references should I cite?
Most of Stata’s estimation commands provide the vce(robust) option. By specifying vce(robust), one may forgo modelbased variance estimates in favor of the more modelagnostic “robust” variances. Robust variances give accurate assessments of the sampletosample variability of the parameter estimates even when the model is misspecified. The robust variance comes under various names and within Stata is known as the Huber/White/sandwich estimate of variance. The names Huber and White refer to the seminal references for this estimator:
The name “sandwich” refers to the mathematical form of the estimate, namely, that it is calculated as the product of three matrices: the matrix formed by taking the outer product of the observationlevel likelihood/pseudolikelihood score vectors is used as the middle of these matrices (the meat of the sandwich), and this matrix is in turn pre and postmultiplied by the usual modelbased variance matrix (the bread of the sandwich).
Huber (1967) and White (1980), however, do not deal with clustering. When you have clustering, the observations within cluster may not be treated as independent, but the clusters themselves are independent. Here, the robust calculation is straightforwardly generalized by substituting the meat of the sandwich with a matrix formed by taking the outer product of the clusterlevel scores, where within each cluster the clusterlevel score is obtained by summing the observationlevel scores. See Rogers (1993) and [P] _robust for details.
This generalization for clustering is, in fact, so “straightforward” that it has for a long time (until Froot [1989]) remained undocumented in the literature. In fact, Williams (2000) is simply a short note that comments on this fact and gives a short proof of the validity of the estimator:
This brief note presents a general proof that the [modifiedsandwich] estimator is unbiased for clustercorrelated data regardless of the setting. The result is not new, but a simple and general reference is not readily available.
The above hints that Froot (1989) may be little known outside the econometrics community and Rogers (1993) is little known among nonStata users. Those requiring a reference from a refereed journal can therefore cite Froot (1989) as the seminal reference or Williams (2000) for its direct statement of this result. Those wanting a reference for how the calculation is actually performed in Stata can use Rogers (1993). Also, those wanting a textbook proof can cite Wooldridge (2002, sec. 13.8.2).
Finally, although White did not explicitly consider cluster sampling, he did address the finitely correlated case in his 1984 and 1994 books. The results for cluster analysis can also be derived from the results in section 8.3 of White (1994).