Stata | FAQ: Comparison of standard errors for robust, cluster, and standard estimators

Home / Resources & support / FAQs / Comparison of standard errors for robust, cluster, and standard estimators

How can the standard errors with the vce(cluster clustvar) option be smaller than those without the vce(cluster clustvar) option?

Title		Comparison of standard errors for robust, cluster, and standard estimators
Author		William Sribney, StataCorp

Question:

I ran a regression with data for clients clustered by therapist. I first estimated the regression without using the vce(cluster clustvar) option, then I re-ran it using the vce(cluster clustvar) option. In many cases, the standard errors were much smaller when I used the vce(cluster clustvar) option. Does this seem reasonable?

Answer

The short answer is that this can happen when the intracluster correlations are negative.

Let me back up and explain the mechanics of what can happen to the standard errors.

Let’s consider the following three estimators available with the regress command: the ordinary least squares (OLS) estimator, the robust estimator obtained when the vce(robust) option is specified (without the vce(cluster clustvar) option), and the robust cluster estimator obtained when the vce(cluster clustvar) option is specified.

Comparing the three variance estimators: OLS, robust, and robust cluster

The formulas for the estimators are

OLS variance estimator:

        V_OLS = s² * (X'X)^-1

where

                         N
        s² = (1/(N - k)) Σ e_i²
                        i=1

Robust (unclustered) variance estimator:

                          N
        V_rob = (X'X)^-1 * [ Σ (e_i*x_i)' * (e_i*x_i) ] * (X'X)^-1
                         i=1

Robust cluster variance estimator:

                            n_c
        V_cluster = (X'X)^-1 * Σ u_j'*u_j * (X'X)^-1
                           j=1

where

        u_j = Σ e_i*x_i
             j_cluster

and n_c is the total number of clusters.

Above, e_i is the residual for the ith observation and x_i is a row vector of predictors including the constant.

For simplicity, I omitted the multipliers (which are close to 1) from the formulas for V_rob and V_clusters.

The formula for the clustered estimator is simply that of the robust (unclustered) estimator with the individual e_i*x_i’s replaced by their sums over each cluster.

See the manual entries [R] regress (back of Methods and Formulas), [P] _robust (the beginning of the entry), and [SVY] variance estimation for more details.

Interpreting a difference between (2) the robust (unclustered) estimator and (3) the robust cluster estimator is straightforward. If the variance of the clustered estimator is less than the robust (unclustered) estimator, it means that the cluster sums of e_i*x_i have less variability than the individual e_i*x_i. That is, when you sum the e_i*x_i within a cluster, some of the variation gets canceled out, and the total variation is less. This means that a big positive is summed with a big negative to produce something small—there is negative correlation within cluster.

Interpreting a difference between (1) the OLS estimator and (2) or (3) is trickier. In (1) the squared residuals are summed, but in (2) and (3) the residuals are multiplied by the x’s (then for (3) summed within cluster) and then "squared" and summed. Hence, any difference between them has to do with correlations between the residuals and the x’s. If big (in absolute value) e_i are paired with big x_i, then the robust variance estimate will be bigger than the OLS estimate. If, on the other hand, the robust variance estimate is smaller than the OLS estimate, what’s happening is not clear at all but has to do with some odd correlations between the residuals and the x’s.

If the OLS model is true, the residuals should, of course, be uncorrelated with the x’s. Indeed, if all the assumptions of the OLS model are true, then the expected values of (1) the OLS estimator and (2) the robust (unclustered) estimator are approximately the same when the default multiplier is used. When the optional multiplier obtained by specifying the hc2 option is used, then the expected values are equal; indeed, the hc2 multiplier was constructed so that this would be true. For more information on these multipliers, see example 6 and the Methods and Formulas section in [R] regress.

So, if the robust (unclustered) estimates are just a little smaller than the OLS estimates, it may be that the OLS assumptions are true and you are seeing a bit of random variation. If the robust (unclustered) estimates are much smaller than the OLS estimates, then either you are seeing a lot of random variation (which is possible, but unlikely) or else there is something odd going on between the residuals and the x’s.

Back to the detailed question

The question implied a comparison of (1) OLS versus (3) clustered. I suggest that the (2) robust unclustered estimates also be examined. But I bet that (1) and (2) will be about the same, with (3) still “in many cases ... much smaller”. And the simple explanation for this is negative correlation within cluster.

The questioner mentioned analyzing client data clustered within therapist. If every therapist has some extreme (i.e., big residual) clients, but few therapists have no (or only a few) extreme clients and few therapists have many extreme clients, then one could see a cancellation of variation when the residuals are summed over clusters. So the answer to the question, “Does this seem reasonable?” is yes.

However, since what you are seeing is an effect due to (negative) correlation of residuals, it is important to make sure that the model is reasonably specified and that it includes suitable within-cluster predictors. With the right predictors, the correlation of residuals could disappear, and certainly this would be a better model.

When you are using the robust cluster variance estimator, it’s still important for the specification of the model to be reasonable—so that the model has a reasonable interpretation and yields good predictions—even though the robust cluster variance estimator is robust to misspecification and within-cluster correlation.

How can the standard errors with the vce(cluster clustvar) option be smaller than those without the vce(cluster clustvar) option?

Question:

Answer

Comparing the three variance estimators: OLS, robust, and robust cluster

Back to the detailed question

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How can the standard errors with the vce(cluster clustvar) option be smaller than those without the vce(cluster clustvar) option?

Question:

Answer

Comparing the three variance estimators: OLS, robust, and robust cluster

Back to the detailed question

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies