Let's discuss this with an example: suppose you want to measure the
effect of union status on wages and you have a panel of workers that
you observe over time. There are (at least) two concerns with just
using OLS on the whole sample: you think that there are omitted
variables in your regression such as unobserved ability. If ability is
correlated with union status and wages then the coefficient on union
status will be biased. Note that this is not a problem of error terms
not being independent from each other, but of error terms being
correlated with one of your right hand side variables. If ability is
constant over time for an individual, then including individual fixed
effects will solve the problem, as this indirectly controls for
ability. Clustering can never solve this problem. In fact clustering
doesn't affect your coefficient estimates only the standard errors, so
obviously it cannot deal with omitted variable "bias", as this shifts
the estimates. The other problem that when you observe individuals
over time there is no reason why the error term should not be
correlated over time for this individual. This means that you have in
fact less independent observations and thus less information.
Clustering will adjust your standard errors for this problem. Since it
takes account of the fact that there is less information than if the
errors where all independent, standard errors nearly always go up.
There are other methods of course, such as FGLS. Fixed effects can
also deal with correlated error terms but it is pretty restrictive
(e.g. it allows only a positive uniform correlation among the errors
in one "cluster"), so usually the reason for FE is the stated omitted
variable bias, not correlation of error terms with each other.
I would say that in applied microeconometricians it is very much
standard today to always cluster your standard error on the same level
on which you would use fixed effects. Of course if you use several
fixed effects (e.g. in an education framework you might have grade,
school and year fixed effects) you have to put some thinking into the
question on which level it is best to cluster. Generally it is not
only fine to use FE and cluster together, I would go as far as saying
that not doing it is a bit fishy. I think this trend in economics is
only a couple of years old when a paper by duflo, bertrand and
mullainathan pointed out the severity of this problem. Maybe in other
disciplines this is not (yet) standard.
best, Johannes
On 9/24/06, Jason Yackee <jyackee@law.usc.edu> wrote:
Johannes,
Thank you for this helpful reply. More essentially, I am wondering
whether adding FE _and_ clustering is (harmfully) redundant, where you
are clustering and "fixing" on the same id variable. So in your opinion
it is fine to FE (or to add unit dummy variables (e.g. "country")) _and_
to cluster on the same units (e.g. "country" again)?
I ask because I had been taught that clustering on your unit ID was a
"weak" first-try method of dealing with intra-group correlations, and
that adding unit fixed effects (either via the -xtgls, fe- method or a
LSDV approach) was a more radical second-try method; if the first method
(clustering) works, then stick with that; but if it doesn't, then move
on to the "stronger" FE approach, which has more inherent drawbacks than
clustering (such as forcing you to drop time-invariant, unit-specific
variables of theoretical interest).
On the other hand, it has been emphasized on this listserv in the past
that clustering works best as the number of clusters approaches infinity
(a point curiously not emphasized in the [U] manual entry on robust SEs
and clustering). Perhaps in my case 100 clusters is too small a number?
I haven't really noticed people using FE and also clustering on the same
group variable, and am worried that what I am doing is "overkill" that
is causing my SEs to be overinflated. You seem to say "no worries", and
I am very willing to take your word. But I am wondering if others might
agree?
Jason
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Johannes
Schmieder
Sent: Saturday, September 23, 2006 2:27 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: appropriateness of cluster option with xtreg, fe
My thoughts on this: without the clustering, Stata assumes that the
underlying statistical model has 100 * 25 = 2500 observations with
independent error terms. The clustering adjusts for correlations between
the error terms over time, so you have in effect less independent
observations and you should expect your standard errors to go up. This
is nearly always the case, the example on the faq you mentioned is more
the exception (you need a strong negative correlation between your error
terms and even then it is not necessarily the case that the SE go down).
If you have reasons to believe that error terms are not independent in a
subgroup of your observations (such as for the different time periods
for a specific individual in a panel, or e.g. for observations that are
spatially
close) you should always cluster your SE.
regards, johannes
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Johannes F. Schmieder
Ph.D. Student
Department of Economics
Columbia University
email: jfs2106@columbia.edu
cell: (+1) 631 903 5646
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/