How can the standard errors with the vce(cluster clustvar) option be smaller than those
without the vce(cluster clustvar) option?
||Comparison of standard errors for robust, cluster, and standard estimators
William Sribney, StataCorp
July 1998; minor revisions July 2009
I ran a regression with data for clients clustered by therapist. I first
estimated the regression without using the vce(cluster
clustvar) option, then I re-ran it using the
option. In many cases, the standard errors were much smaller when I used
the vce(cluster clustvar)
option. Does this seem reasonable?
The short answer is that this can happen when the intracluster correlations
Let me back up and explain the mechanics of what can happen to the
Let’s consider the following three estimators available with the
command: the ordinary least squares
(OLS) estimator, the robust estimator obtained when the
vce(robust) option is specified (without the
option), and the robust cluster estimator obtained when the
option is specified.
Comparing the three variance estimators: OLS, robust, and robust cluster
The formulas for the estimators are
- OLS variance estimator:
VOLS = s2 * (X'X)-1
s2 = (1/(N - k)) Σ ei2
- Robust (unclustered) variance estimator:
Vrob = (X'X)-1 * [ Σ (ei*xi)' * (ei*xi) ] * (X'X)-1
- Robust cluster variance estimator:
Vcluster = (X'X)-1 * Σ uj'*uj * (X'X)-1
uj = Σ ei*xi
and nc is the total number of clusters.
Above, ei is the residual for the ith observation and
xi is a row vector of predictors including the constant.
For simplicity, I omitted the multipliers (which are close to 1) from the
formulas for Vrob and Vclusters.
The formula for the clustered estimator is simply that of the robust
(unclustered) estimator with the individual
ei*xi’s replaced by their sums over each
See the manual entries [R] regress (back of Methods and Formulas),
(the beginning of the entry), and [SVY] variance estimation
for more details.
Interpreting a difference between (2) the robust (unclustered) estimator and
(3) the robust cluster estimator is straightforward. If the variance of the
clustered estimator is less than the robust (unclustered) estimator, it
means that the cluster sums of ei*xi have less
variability than the individual ei*xi. That is, when
you sum the ei*xi within a cluster, some of the
variation gets canceled out, and the total variation is less. This means
that a big positive is summed with a big negative to produce something
small—there is negative correlation within cluster.
Interpreting a difference between (1) the OLS estimator and (2) or (3) is
trickier. In (1) the squared residuals are summed, but in (2) and (3) the
residuals are multiplied by the x’s (then for (3) summed within
cluster) and then "squared" and summed. Hence, any difference between them
has to do with correlations between the residuals and the x’s. If big
(in absolute value) ei are paired with big xi, then
the robust variance estimate will be bigger than the OLS estimate. If, on
the other hand, the robust variance estimate is smaller than the OLS
estimate, what’s happening is not clear at all but has to do with some
odd correlations between the residuals and the x’s.
If the OLS model is true, the residuals should, of course, be uncorrelated
with the x’s. Indeed, if all the assumptions of the OLS model are
true, then the expected values of (1) the OLS estimator and (2) the robust
(unclustered) estimator are approximately the same when the default
multiplier is used. When the optional multiplier obtained by specifying the
hc2 option is used, then the expected values are equal; indeed, the
hc2 multiplier was constructed so that this would be true. For more
information on these multipliers, see example 5 and the Methods and Formulas
section in [R] regress.
So, if the robust (unclustered) estimates are just a little smaller than the
OLS estimates, it may be that the OLS assumptions are true and you are
seeing a bit of random variation. If the robust (unclustered) estimates are
much smaller than the OLS estimates, then either you are seeing a lot of
random variation (which is possible, but unlikely) or else there is
something odd going on between the residuals and the x’s.
Back to the detailed question
The question implied a comparison of (1) OLS versus (3) clustered. I
suggest that the (2) robust unclustered estimates also be examined. But I
bet that (1) and (2) will be about the same, with (3) still “in many
cases ... much smaller”. And the simple explanation for this is
negative correlation within cluster.
The questioner mentioned analyzing client data clustered within therapist.
If every therapist has some extreme (i.e., big residual) clients, but few
therapists have no (or only a few) extreme clients and few therapists have
many extreme clients, then one could see a cancellation of variation when
the residuals are summed over clusters. So the answer to the question,
“Does this seem reasonable?” is yes.
However, since what you are seeing is an effect due to (negative)
correlation of residuals, it is important to make sure that the model is
reasonably specified and that it includes suitable within-cluster
predictors. With the right predictors, the correlation of residuals could
disappear, and certainly this would be a better model.
When you are using the robust cluster variance estimator, it’s still
important for the specification of the model to be reasonable—so that
the model has a reasonable interpretation and yields good
predictions—even though the robust cluster variance estimator is
robust to misspecification and within-cluster correlation.