Antwort: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

Date

Thu, 6 Jan 2011 01:20:26 +0100

Hi,

just two caveats

1) Using cluster-option, you should have a decent number of clusters to profit from its beneficial characteristics (Kit can probably highlight on this). I guess 4 clusters is far from large....

2) in cross-sectional micro data, I would use cluster-option when my variable of interest varies only across countries, as standard errors are then corrected for this. For example, this could be an institution, like democracy

to your question: is z a vector of country-characteristics in your micro model? That could possibly explain your finding...

An: [email protected] Von: Jacob Felson <[email protected]> Gesendet von: [email protected] Datum: 06.01.2011 01:01AM Thema: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

I wonder if anyone might be able to provide an explanation for the following scenario. I'm wondering why the direction of the change in a standard error affected by the use of the cluster command depends on the whether another control variable is included. My inquiry is more theoretical than practical, as I'm not wondering "what I should do" but rather, simply "why is this happening?" Let me elaborate below.

Consider the following variables:

y, the dependent variable x, the independent variable of greatest interest, which is moderately correlated with y and with z z, another independent variable, which is correlated with y at about 0.5.

nation - the data was collected in 4 different nations by different organizations.

I am examining the standard errors (SE) for the coefficient of variable x from the following four models:

1. Regress y on x, without clustering on nation. 2. Regress y on x, with clustering on nation.

3. Regress y on x and z without clustering on nation. 4. Regress y on x and z with clustering on nation.

The SE of the coefficient for x is LARGER in model 2 than in model 1. This suggests there is a positive intercluster correlation. That is, the residuals are more similar to each other within nations than we would expect by chance alone. I suppose there is a preponderance of positive residuals in some nations and a preponderance of negative residuals in other nations.

The SE of the coefficient for x is SMALLER in model 4 than in model 3. This suggests there is a negative intercluster correlation. That is, the residuals are less similar to each other within nations than we would expect by chance.

So the effect that clustering on nation has on the SE of x depends on whether a third variable, z, is controlled. Why is this?