Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Patrick Button" <pbutton@uci.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Polynomial Fitting and RD Design |

Date |
Wed, 31 Aug 2011 18:54:52 -0700 |

Hello Stata users, I've been getting some unexpected Stata output when fitting polynomials using a pretty simple OLS regression. I am replicating a regression discontinuity design paper (Lee, Moretti and Butler 2004). The paper is here: http://emlab.berkeley.edu/~moretti/final.pdf Code and data are here: http://emlab.berkeley.edu/~moretti/data3.html (I am using enricoall2.dta). I need to run a regression that fits a 4th degree polynomial separately for points of the running variable, x, below 0.5 and above 0.5. The regression includes a dummy variable for if x >= 0.5 or not as well. If there is a discontinuity at 0.5, then this is picked up in the coefficient on that dummy variable. In this case the running variable is the vote share that the Democratic candidate got in U.S. House of Representatives elections, including just the Democratic and Republican votes. So x < 0.5 means a Republican won, and >= 0.5 means a Democrat won. I would like to pool the data instead of running a separate regression for each side. This is one of the recommended methods in the RD literature. For some reason this method does not appear in the authors' code so I need to do it myself. I'm running and setting up the regression as follows: *** gen x = demvoteshare gen D = 1 if x >=0.5 replace D = 0 if x < 0.5 *Left Side Polynomial gen xa = (1-D)*x gen x2a = (1-D)*x^2 gen x3a = (1-D)*x^3 gen x4a = (1-D)*x^4 *Right Side Polynomial gen xb = D*x gen x2b = D*x^2 gen x3b = D*x^3 gen x4b = D*x^4 regress realincome D xa x2a x3a x4a xb x2b x3b x4b *** Based on what the authors of the paper got, graphical analysis, and logic, there should be no jump in realincome at 0.5. There is no reason why income should be suddenly much different for districts that democrats just barely won or just barely lost. If it is, this invalidates the regression discontinuity design. So the coefficient on D should be statistically insignificant. However, I get the following results: ------------------------------------------------------------------------------ realincome | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- D | 497414.5 94802.12 5.25 0.000 311589 683240.1 xa | 34396.25 27783.67 1.24 0.216 -20063.66 88856.17 x2a | -22571.61 234577.9 -0.10 0.923 -482377.5 437234.3 x3a | -429659.3 655505.3 -0.66 0.512 -1714542 855223.6 x4a | 667813.9 598416.4 1.12 0.264 -505166.7 1840795 xb | -2805647 534665.3 -5.25 0.000 -3853667 -1757628 x2b | 5828381 1112850 5.24 0.000 3647038 8009724 x3b | -5281210 1012800 -5.21 0.000 -7266441 -3295979 x4b | 1754682 339914.5 5.16 0.000 1088402 2420963 _cons | 31536.64 501.1422 62.93 0.000 30554.33 32518.95 ------------------------------------------------------------------------------ I have no idea why D is statistically significant, and why only the polynomial on the right side is statistically significant. This is not just a problem with this regression. I get messed up results for every regression I run that has a 4th degree polynomial on each side of 0.5. However, I do not get weird results like this when I use just one 4th degree polynomial (one for the entire thing) with the D dummy. Does anyone know what I am doing wrong? I have no idea but I have a feeling that i'm missing something obvious. Thank you very much for your time and consideration. -- Patrick Button Ph.D. Student Department of Economics University of California, Irvine * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Polynomial Fitting and RD Design***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: Question about pstest (after running psmatch2)** - Next by Date:
**Re: st: Polynomial Fitting and RD Design** - Previous by thread:
**st: smcl compatibility** - Next by thread:
**Re: st: Polynomial Fitting and RD Design** - Index(es):