Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Problem with IV regression and two-way clustering

From   "Tobias Pfaff" <>
To   <>
Subject   st: Problem with IV regression and two-way clustering
Date   Thu, 27 Sep 2012 21:29:40 +0200

Dear Statalisters,

I would kindly ask you for comments on an instrumental-variables regression
with (two-way) clustered standard errors, which is a challenge for me.
I'm afraid that the whole problem cannot be written in just a few lines.
Below is the whole story (which is hopefully interesting to some of you).

Any help is greatly appreciated!

Now the setting:

Unbalanced individual panel data set, single country
Obs.: 170,000
Individuals: 28,000
Regions: 14
Years: 9
Dependent variable measured on the individual level
Independent variable of interest (focusvar) measured on the regional level
Further control variables: 10, all at the individual level, plus region and
year dummies (20 dummies)

I use individual fixed effects and I cluster on the individual level to
control for correlation of the errors over time and get the result that my
focus variable is significant:
-xtivreg2 depvar focusvar controlvars, fe cluster(pid)-

My focus variable is aggregated at a higher level (region) than the
dependent variable (individual), and I know from Moulton (1990) that my
standard errors can be biased downwards dramatically if I do not cluster at
the regional level. Additionally, Donald and Lang (2007) say that without
clustering on the regional level, I dramatically overstate the significance
of the coefficients. Therefore, I use two-way clustering on the individual
and on the regional level:
-xtivreg2 depvar focusvar controlvars, fe cluster(pid region)-

Now my focus variable is insignificant. However, the number of clusters is
small (14), which again leads to biased results (Donald and Lang 2007).
Cameron et al. (2011) tell me that "With a small number of clusters the
cluster-robust standard errors are downwards biased" (p. 414). Since my
focus variable is already insignificant, I would expect the coefficient to
be even more insignificant, if I would correct for the bias induced by the
small number of clusters, and I conclude that I find no evidence for

Now comes the challenge (as if it has not yet been enough):
I want to do an IV regression to make sure that my results are not
influenced by endogeneity bias. I found a variable on the regional level
which is theoretically a fine instrument for my regional focus variable. The
correlation between the focus variable and the instrument is .60.

I now estimate the IV model with two-way clustered standard errors:
-xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid
region) first-

The size of the coefficient of my focus variable has decreased. The standard
errors have increased drastically, and the coefficient is by far not
significant. In the first-stage regression, the instrument is not
significant. The tests say that the instrument is weak and I cannot reject
the null of underidentification.  I interpret this as evidence that I have a
bad instrument or that my focus variable is not endogenous.

However, a different picture appears when I only cluster at the individual
-xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid)

The standard errors of my focus variable are still much larger than the
non-IV estimates, but smaller compared to IV with two-way clustering. The
focus variable is again not significant. The instrument is highly
significant in the first-stage regression. The tests indicate that the
hypotheses of a weak instrument and of underidentification can be rejected.
I would interpret this as evidence that my instrument is valid and that my
focus variable is endogenous.

My interpretation is that the results generally suggest that my focus
variable is not significant.

Open questions:
Is my interpretation wrong?
Is my instrument good or bad - should I trust the results from the one-way
or two-way clustering for the IV approach?
In case I want to cluster on the regional level and correct for the bias due
to a small number of clusters, I could use wild-bootstrapping as proposed by
Cameron et al. (2011), but does that work for IV as well?

Thanks very much for any clarification,

Cited literature:
Cameron, Gelbach, Miller (2008), Bootstrap-Based Improvements for Inference
with Clustered Errors. The Review of Economics and Statistics, 90 (3),
Donald, Lang (2007), Inference with Difference-in-Differences and Other
Panel Data. The Review of Economics and Statistics, 89 (2), 221-233.
Moulton (1990), An Illustration of a Pitfall in Estimating the Effects of
Aggregate Variables on Micro Units. The Review of Economics and Statistics,
72 (2), 334-338.

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index