Antwort: Re: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

Date

Thu, 6 Jan 2011 09:49:19 +0100

Hi Austin,

very informative presentation.

I am wondering, however, whether the number of minimum cluster depends on the type of estimator, namely whether we can expect it to be higher for e.g. xtlogit.

Is it also affected by a non-linearity in the model specification ?. Any insights on this ?

An: statalist@hsphsun2.harvard.edu Von: Austin Nichols <austinnichols@gmail.com> Gesendet von: owner-statalist@hsphsun2.harvard.edu Datum: 06.01.2011 02:53AM Thema: Re: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable

Jacob Felson <felsonj@gmail.com> : You should have at least 20 clusters and your smallest cluster should be at least 5% of the data (i.e. 20 balanced clusters, or more unbalanced clusters; see e.g. http://www.stata.com/meeting/13uk/nichols_crse.pdf) to feel comfortable with the cluster-robust SE estimator. But to answer your original question, the residuals are quite different after you include z as a regressor, so the intracluster correlation can also be quite different.

On Wed, Jan 5, 2011 at 8:02 PM, Stas Kolenikov <skolenik@gmail.com> wrote: > There are terrible small sample biases exhibited by -robust- and > -cluster()- standard errors with small # of observations and clusters, > respectively. As was noted by Justina, four clusters is SO far away > from asymptotics that I wouldn't even consider the clustered standard > errors in your situation. > > On Wed, Jan 5, 2011 at 6:01 PM, Jacob Felson <felsonj@gmail.com> wrote: >> I wonder if anyone might be able to provide an explanation for the >> following scenario. I'm wondering why the direction of the change in >> a standard error affected by the use of the cluster command depends on >> the whether another control variable is included. My inquiry is more >> theoretical than practical, as I'm not wondering "what I should do" >> but rather, simply "why is this happening?" Let me elaborate below. >> >> Consider the following variables: >> >> y, the dependent variable >> x, the independent variable of greatest interest, which is moderately >> correlated with y and with z >> z, another independent variable, which is correlated with y at about 0.5. >> >> nation - the data was collected in 4 different nations by different >> organizations. >> >> >> I am examining the standard errors (SE) for the coefficient of >> variable x from the following four models: >> >> 1. Regress y on x, without clustering on nation. >> 2. Regress y on x, with clustering on nation. >> >> 3. Regress y on x and z without clustering on nation. >> 4. Regress y on x and z with clustering on nation. >> >> >> The SE of the coefficient for x is LARGER in model 2 than in model 1. >> This suggests there is a positive intercluster correlation. That is, >> the residuals are more similar to each other within nations than we >> would expect by chance alone. I suppose there is a preponderance of >> positive residuals in some nations and a preponderance of negative >> residuals in other nations. >> >> The SE of the coefficient for x is SMALLER in model 4 than in model 3. >> This suggests there is a negative intercluster correlation. That is, >> the residuals are less similar to each other within nations than we >> would expect by chance. >> >> >> So the effect that clustering on nation has on the SE of x depends on >> whether a third variable, z, is controlled. Why is this?