Antwort: RE: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

Date

Thu, 6 Jan 2011 12:52:46 +0100

Hi Eric,

same question to you: hos des the minimum number of clusters change when the estimator is non-linear ? Most examples and simulations appear to be for OLS or GLS estimators.

You can find a good presentation of cluster sampling issues (Lecture 7) -- and, indeed, an excellent course in recent development in econometrics -- here: http://www.cemmap.ac.uk/resources/resources25.php You have both the slides and the lecture notes. The level supposes a good basis in econometrics.

-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jacob Felson Sent: 06 January 2011 01:30 To: [email protected] Subject: Re: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

Justina,

Thank you for your input. Is it improper to use the cluster command with only 4 nations? I will attempt #2.

Jacob Felson

On Wed, Jan 5, 2011 at 7:20 PM, Justina Fischer <[email protected]> wrote: > Hi, > > just two caveats > > 1) Using cluster-option, you should have a decent number of clusters > to profit from its beneficial characteristics (Kit can probably > highlight on this). I guess 4 clusters is far from large.... > > 2) in cross-sectional micro data, I would use cluster-option when my > variable of interest varies only across countries, as standard errors > are then corrected for this. For example, this could be an > institution, like democracy > > > to your question: is z a vector of country-characteristics in your > micro model? That could possibly explain your finding... > > Justina > > > [email protected] schrieb: ----- > > An: [email protected] > Von: Jacob Felson <[email protected]> > Gesendet von: [email protected] > Datum: 06.01.2011 01:01AM > Thema: st: Direction of the effect of the cluster command on the > standard error depends on the inclusion of a control variable > > I wonder if anyone might be able to provide an explanation for the > following scenario. I'm wondering why the direction of the change in > a standard error affected by the use of the cluster command depends on > the whether another control variable is included. My inquiry is more > theoretical than practical, as I'm not wondering "what I should do" > but rather, simply "why is this happening?" Let me elaborate below. > > Consider the following variables: > > y, the dependent variable > x, the independent variable of greatest interest, which is moderately > correlated with y and with z z, another independent variable, which is > correlated with y at about 0.5. > > nation - the data was collected in 4 different nations by different > organizations. > > > I am examining the standard errors (SE) for the coefficient of > variable x from the following four models: > > 1. Regress y on x, without clustering on nation. > 2. Regress y on x, with clustering on nation. > > 3. Regress y on x and z without clustering on nation. > 4. Regress y on x and z with clustering on nation. > > > The SE of the coefficient for x is LARGER in model 2 than in model 1. > This suggests there is a positive intercluster correlation. That is, > the residuals are more similar to each other within nations than we > would expect by chance alone. I suppose there is a preponderance of > positive residuals in some nations and a preponderance of negative > residuals in other nations. > > The SE of the coefficient for x is SMALLER in model 4 than in model 3. > This suggests there is a negative intercluster correlation. That is, > the residuals are less similar to each other within nations than we > would expect by chance. > > > So the effect that clustering on nation has on the SE of x depends on > whether a third variable, z, is controlled. Why is this? > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > >