» Home » Resources & support » FAQs » Comparison of standard errors for robust, cluster, and standard estimators

Title | Comparison of standard errors for robust, cluster, and standard estimators | |

Author | William Sribney, StataCorp |

I ran a regression with data for clients clustered by therapist. I first estimated the regression without using the vce(cluster clustvar) option, then I re-ran it using the vce(cluster clustvar) option. In many cases, the standard errors were much smaller when I used the vce(cluster clustvar) option. Does this seem reasonable?

The short answer is that this can happen when the intracluster correlations are negative.

Let me back up and explain the mechanics of what can happen to the standard errors.

Let’s consider the following three estimators available with the regress command: the ordinary least squares (OLS) estimator, the robust estimator obtained when the vce(robust) option is specified (without the vce(cluster clustvar) option), and the robust cluster estimator obtained when the vce(cluster clustvar) option is specified.

The formulas for the estimators are

- OLS variance estimator:

where**V**_{OLS}= s^{2}* (X'X)^{-1}**N s**^{2}= (1/(N - k)) Σ e_{i}^{2}i=1 - Robust (unclustered) variance estimator:
**N V**_{rob}= (X'X)^{-1}* [ Σ (e_{i}*x_{i})' * (e_{i}*x_{i}) ] * (X'X)^{-1}i=1 - Robust cluster variance estimator:

where**n**_{c}V_{cluster}= (X'X)^{-1}* Σ u_{j}'*u_{j}* (X'X)^{-1}j=1

and**u**_{j}= Σ e_{i}*x_{i}j_{cluster}**n**is the total number of clusters._{c}

Above, e_{i} is the residual for the *i*th observation and
x_{i} is a row vector of predictors including the constant.

For simplicity, I omitted the multipliers (which are close to 1) from the
formulas for V_{rob} and V_{clusters}.

The formula for the clustered estimator is simply that of the robust
(unclustered) estimator with the individual
e_{i}*x_{i}’s replaced by their sums over each
cluster.

See the manual entries [R] regress (back of Methods and Formulas),
[P] **_robust**
(the beginning of the entry), and [SVY] **variance estimation**
for more details.

Interpreting a difference between (2) the robust (unclustered) estimator and
(3) the robust cluster estimator is straightforward. If the variance of the
clustered estimator is less than the robust (unclustered) estimator, it
means that the cluster sums of e_{i}*x_{i} have less
variability than the individual e_{i}*x_{i}. That is, when
you sum the e_{i}*x_{i} within a cluster, some of the
variation gets canceled out, and the total variation is less. This means
that a big positive is summed with a big negative to produce something
small—there is negative correlation within cluster.

Interpreting a difference between (1) the OLS estimator and (2) or (3) is
trickier. In (1) the squared residuals are summed, but in (2) and (3) the
residuals are multiplied by the x’s (then for (3) summed within
cluster) and then "squared" and summed. Hence, any difference between them
has to do with correlations between the residuals and the x’s. If big
(in absolute value) e_{i} are paired with big x_{i}, then
the robust variance estimate will be bigger than the OLS estimate. If, on
the other hand, the robust variance estimate is smaller than the OLS
estimate, what’s happening is not clear at all but has to do with some
odd correlations between the residuals and the x’s.

If the OLS model is true, the residuals should, of course, be uncorrelated
with the x’s. Indeed, if all the assumptions of the OLS model are
true, then the expected values of (1) the OLS estimator and (2) the robust
(unclustered) estimator are approximately the same when the default
multiplier is used. When the optional multiplier obtained by specifying the
**hc2** option is used, then the expected values are equal; indeed, the
**hc2** multiplier was constructed so that this would be true. For more
information on these multipliers, see example 6 and the Methods and Formulas
section in [R] regress.

So, if the robust (unclustered) estimates are just a little smaller than the OLS estimates, it may be that the OLS assumptions are true and you are seeing a bit of random variation. If the robust (unclustered) estimates are much smaller than the OLS estimates, then either you are seeing a lot of random variation (which is possible, but unlikely) or else there is something odd going on between the residuals and the x’s.

The question implied a comparison of (1) OLS versus (3) clustered. I suggest that the (2) robust unclustered estimates also be examined. But I bet that (1) and (2) will be about the same, with (3) still “in many cases ... much smaller”. And the simple explanation for this is negative correlation within cluster.

The questioner mentioned analyzing client data clustered within therapist. If every therapist has some extreme (i.e., big residual) clients, but few therapists have no (or only a few) extreme clients and few therapists have many extreme clients, then one could see a cancellation of variation when the residuals are summed over clusters. So the answer to the question, “Does this seem reasonable?” is yes.

However, since what you are seeing is an effect due to (negative) correlation of residuals, it is important to make sure that the model is reasonably specified and that it includes suitable within-cluster predictors. With the right predictors, the correlation of residuals could disappear, and certainly this would be a better model.

When you are using the robust cluster variance estimator, it’s still important for the specification of the model to be reasonable—so that the model has a reasonable interpretation and yields good predictions—even though the robust cluster variance estimator is robust to misspecification and within-cluster correlation.