Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Tobias Pfaff" <tobias.pfaff@uni-muenster.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Problem with IV regression and two-way clustering |

Date |
Thu, 27 Sep 2012 21:29:40 +0200 |

Dear Statalisters, I would kindly ask you for comments on an instrumental-variables regression with (two-way) clustered standard errors, which is a challenge for me. I'm afraid that the whole problem cannot be written in just a few lines. Below is the whole story (which is hopefully interesting to some of you). Any help is greatly appreciated! Now the setting: Unbalanced individual panel data set, single country Obs.: 170,000 Individuals: 28,000 Regions: 14 Years: 9 Dependent variable measured on the individual level Independent variable of interest (focusvar) measured on the regional level Further control variables: 10, all at the individual level, plus region and year dummies (20 dummies) I use individual fixed effects and I cluster on the individual level to control for correlation of the errors over time and get the result that my focus variable is significant: -xtivreg2 depvar focusvar controlvars, fe cluster(pid)- My focus variable is aggregated at a higher level (region) than the dependent variable (individual), and I know from Moulton (1990) that my standard errors can be biased downwards dramatically if I do not cluster at the regional level. Additionally, Donald and Lang (2007) say that without clustering on the regional level, I dramatically overstate the significance of the coefficients. Therefore, I use two-way clustering on the individual and on the regional level: -xtivreg2 depvar focusvar controlvars, fe cluster(pid region)- Now my focus variable is insignificant. However, the number of clusters is small (14), which again leads to biased results (Donald and Lang 2007). Cameron et al. (2011) tell me that "With a small number of clusters the cluster-robust standard errors are downwards biased" (p. 414). Since my focus variable is already insignificant, I would expect the coefficient to be even more insignificant, if I would correct for the bias induced by the small number of clusters, and I conclude that I find no evidence for significance. Now comes the challenge (as if it has not yet been enough): I want to do an IV regression to make sure that my results are not influenced by endogeneity bias. I found a variable on the regional level which is theoretically a fine instrument for my regional focus variable. The correlation between the focus variable and the instrument is .60. I now estimate the IV model with two-way clustered standard errors: -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid region) first- The size of the coefficient of my focus variable has decreased. The standard errors have increased drastically, and the coefficient is by far not significant. In the first-stage regression, the instrument is not significant. The tests say that the instrument is weak and I cannot reject the null of underidentification. I interpret this as evidence that I have a bad instrument or that my focus variable is not endogenous. However, a different picture appears when I only cluster at the individual level: -xtivreg2 depvar (focusvar = instrumentvar) controlvars, fe cluster(pid) first- The standard errors of my focus variable are still much larger than the non-IV estimates, but smaller compared to IV with two-way clustering. The focus variable is again not significant. The instrument is highly significant in the first-stage regression. The tests indicate that the hypotheses of a weak instrument and of underidentification can be rejected. I would interpret this as evidence that my instrument is valid and that my focus variable is endogenous. Conclusion: My interpretation is that the results generally suggest that my focus variable is not significant. Open questions: Is my interpretation wrong? Is my instrument good or bad - should I trust the results from the one-way or two-way clustering for the IV approach? In case I want to cluster on the regional level and correct for the bias due to a small number of clusters, I could use wild-bootstrapping as proposed by Cameron et al. (2011), but does that work for IV as well? Thanks very much for any clarification, Tobias Cited literature: Cameron, Gelbach, Miller (2008), Bootstrap-Based Improvements for Inference with Clustered Errors. The Review of Economics and Statistics, 90 (3), 414-427. Donald, Lang (2007), Inference with Difference-in-Differences and Other Panel Data. The Review of Economics and Statistics, 89 (2), 221-233. Moulton (1990), An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units. The Review of Economics and Statistics, 72 (2), 334-338. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Problem with IV regression and two-way clustering***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: Post-program tempfiles or a temp directory?** - Next by Date:
**Re: st: Post-program tempfiles or a temp directory?** - Previous by thread:
**st: Calculate and Test Adjusted Mean Differences** - Next by thread:
**Re: st: Problem with IV regression and two-way clustering** - Index(es):