Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Antwort: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

 From Justina Fischer <[email protected]> To [email protected] Subject Antwort: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable Date Thu, 6 Jan 2011 01:20:26 +0100

Hi,

just two caveats

1) Using cluster-option, you should have a decent number of clusters to profit from its beneficial characteristics (Kit can probably highlight on this). I guess 4 clusters is far from large....

2) in cross-sectional micro data, I would use cluster-option when my variable of interest varies only across countries, as standard errors are then corrected for this. For example, this could be an institution, like democracy

to your question: is z a vector of country-characteristics in your micro model? That could possibly explain your finding...

Justina

[email protected] schrieb: -----

An: [email protected]
Von: Jacob Felson <[email protected]>
Gesendet von: [email protected]
Datum: 06.01.2011 01:01AM
Thema: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

I wonder if anyone might be able to provide an explanation for the
following scenario.  I'm wondering why the direction of the change in
a standard error affected by the use of the cluster command depends on
the whether another control variable is included.  My inquiry is more
theoretical than practical, as I'm not wondering "what I should do"
but rather, simply "why is this happening?"   Let me elaborate below.

Consider the following variables:

y, the dependent variable
x, the independent variable of greatest interest, which is moderately
correlated with y and with z
z, another independent variable, which is correlated with y at about 0.5.

nation - the data was collected in 4 different nations by different
organizations.

I am examining the standard errors (SE) for the coefficient of
variable x from the following four models:

1. Regress y on x, without clustering on nation.
2. Regress y on x, with clustering on nation.

3. Regress y on x and z without clustering on nation.
4. Regress y on x and z with clustering on nation.

The SE of the coefficient for x is LARGER in model 2 than in model 1.
This suggests there is a positive intercluster correlation.  That is,
the residuals are more similar to each other within nations than we
would expect by chance alone.  I suppose there is a preponderance of
positive residuals in some nations and a preponderance of negative
residuals in other nations.

The SE of the coefficient for x is SMALLER in model 4 than in model 3.
This suggests there is a negative intercluster correlation.  That is,
the residuals are less similar to each other within nations than we
would expect by chance.

So the effect that clustering on nation has on the SE of x depends on
whether a third variable, z, is controlled.  Why is this?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

• Follow-Ups:
• References: