|Title||Comparison of standard errors for robust, cluster, and standard estimators|
|Author||William Sribney, StataCorp|
|Date||July 1998; minor revisions July 2009; minor revisions July 2013|
I ran a regression with data for clients clustered by therapist. I first estimated the regression without using the vce(cluster clustvar) option, then I re-ran it using the vce(cluster clustvar) option. In many cases, the standard errors were much smaller when I used the vce(cluster clustvar) option. Does this seem reasonable?
The short answer is that this can happen when the intracluster correlations are negative.
Let me back up and explain the mechanics of what can happen to the standard errors.
Let’s consider the following three estimators available with the regress command: the ordinary least squares (OLS) estimator, the robust estimator obtained when the vce(robust) option is specified (without the vce(cluster clustvar) option), and the robust cluster estimator obtained when the vce(cluster clustvar) option is specified.
The formulas for the estimators are
VOLS = s2 * (X'X)-1where
N s2 = (1/(N - k)) Σ ei2 i=1
N Vrob = (X'X)-1 * [ Σ (ei*xi)' * (ei*xi) ] * (X'X)-1 i=1
nc Vcluster = (X'X)-1 * Σ uj'*uj * (X'X)-1 j=1where
uj = Σ ei*xi jclusterand nc is the total number of clusters.
Above, ei is the residual for the ith observation and xi is a row vector of predictors including the constant.
For simplicity, I omitted the multipliers (which are close to 1) from the formulas for Vrob and Vclusters.
The formula for the clustered estimator is simply that of the robust (unclustered) estimator with the individual ei*xi’s replaced by their sums over each cluster.
Interpreting a difference between (2) the robust (unclustered) estimator and (3) the robust cluster estimator is straightforward. If the variance of the clustered estimator is less than the robust (unclustered) estimator, it means that the cluster sums of ei*xi have less variability than the individual ei*xi. That is, when you sum the ei*xi within a cluster, some of the variation gets canceled out, and the total variation is less. This means that a big positive is summed with a big negative to produce something small—there is negative correlation within cluster.
Interpreting a difference between (1) the OLS estimator and (2) or (3) is trickier. In (1) the squared residuals are summed, but in (2) and (3) the residuals are multiplied by the x’s (then for (3) summed within cluster) and then "squared" and summed. Hence, any difference between them has to do with correlations between the residuals and the x’s. If big (in absolute value) ei are paired with big xi, then the robust variance estimate will be bigger than the OLS estimate. If, on the other hand, the robust variance estimate is smaller than the OLS estimate, what’s happening is not clear at all but has to do with some odd correlations between the residuals and the x’s.
If the OLS model is true, the residuals should, of course, be uncorrelated with the x’s. Indeed, if all the assumptions of the OLS model are true, then the expected values of (1) the OLS estimator and (2) the robust (unclustered) estimator are approximately the same when the default multiplier is used. When the optional multiplier obtained by specifying the hc2 option is used, then the expected values are equal; indeed, the hc2 multiplier was constructed so that this would be true. For more information on these multipliers, see example 6 and the Methods and Formulas section in [R] regress.
So, if the robust (unclustered) estimates are just a little smaller than the OLS estimates, it may be that the OLS assumptions are true and you are seeing a bit of random variation. If the robust (unclustered) estimates are much smaller than the OLS estimates, then either you are seeing a lot of random variation (which is possible, but unlikely) or else there is something odd going on between the residuals and the x’s.
The question implied a comparison of (1) OLS versus (3) clustered. I suggest that the (2) robust unclustered estimates also be examined. But I bet that (1) and (2) will be about the same, with (3) still “in many cases ... much smaller”. And the simple explanation for this is negative correlation within cluster.
The questioner mentioned analyzing client data clustered within therapist. If every therapist has some extreme (i.e., big residual) clients, but few therapists have no (or only a few) extreme clients and few therapists have many extreme clients, then one could see a cancellation of variation when the residuals are summed over clusters. So the answer to the question, “Does this seem reasonable?” is yes.
However, since what you are seeing is an effect due to (negative) correlation of residuals, it is important to make sure that the model is reasonably specified and that it includes suitable within-cluster predictors. With the right predictors, the correlation of residuals could disappear, and certainly this would be a better model.
When you are using the robust cluster variance estimator, it’s still important for the specification of the model to be reasonable—so that the model has a reasonable interpretation and yields good predictions—even though the robust cluster variance estimator is robust to misspecification and within-cluster correlation.