[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: appropriateness of cluster option with xtreg, fe
Thank you for this helpful reply. More essentially, I am wondering
whether adding FE _and_ clustering is (harmfully) redundant, where you
are clustering and "fixing" on the same id variable. So in your opinion
it is fine to FE (or to add unit dummy variables (e.g. "country")) _and_
to cluster on the same units (e.g. "country" again)?
I ask because I had been taught that clustering on your unit ID was a
"weak" first-try method of dealing with intra-group correlations, and
that adding unit fixed effects (either via the -xtgls, fe- method or a
LSDV approach) was a more radical second-try method; if the first method
(clustering) works, then stick with that; but if it doesn't, then move
on to the "stronger" FE approach, which has more inherent drawbacks than
clustering (such as forcing you to drop time-invariant, unit-specific
variables of theoretical interest).
On the other hand, it has been emphasized on this listserv in the past
that clustering works best as the number of clusters approaches infinity
(a point curiously not emphasized in the [U] manual entry on robust SEs
and clustering). Perhaps in my case 100 clusters is too small a number?
I haven't really noticed people using FE and also clustering on the same
group variable, and am worried that what I am doing is "overkill" that
is causing my SEs to be overinflated. You seem to say "no worries", and
I am very willing to take your word. But I am wondering if others might
[mailto:email@example.com] On Behalf Of Johannes
Sent: Saturday, September 23, 2006 2:27 PM
Subject: Re: st: appropriateness of cluster option with xtreg, fe
My thoughts on this: without the clustering, Stata assumes that the
underlying statistical model has 100 * 25 = 2500 observations with
independent error terms. The clustering adjusts for correlations between
the error terms over time, so you have in effect less independent
observations and you should expect your standard errors to go up. This
is nearly always the case, the example on the faq you mentioned is more
the exception (you need a strong negative correlation between your error
terms and even then it is not necessarily the case that the SE go down).
If you have reasons to believe that error terms are not independent in a
subgroup of your observations (such as for the different time periods
for a specific individual in a panel, or e.g. for observations that are
close) you should always cluster your SE.
* For searches and help try: