Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Polynomial Fitting and RD Design |

Date |
Thu, 1 Sep 2011 07:37:43 +0100 |

Even if you can get this to work as intended, look at the sizes of those coefficients! The resultant curve may look about right, but this is a dubious thing to do numerically and statistically. I can't comment on the underlying scientific rationale for quartics here, although I will guess wildly that there isn't one. Nick On Thu, Sep 1, 2011 at 3:59 AM, Austin Nichols <austinnichols@gmail.com> wrote: > Patrick Button <pbutton@uci.edu>: > Try redefining your x so that the discontinuity is at zero. > > On Wed, Aug 31, 2011 at 9:54 PM, Patrick Button <pbutton@uci.edu> wrote: >> Hello Stata users, >> >> I've been getting some unexpected Stata output when fitting polynomials >> using a pretty simple OLS regression. >> >> I am replicating a regression discontinuity design paper (Lee, Moretti and >> Butler 2004). The paper is here: >> http://emlab.berkeley.edu/~moretti/final.pdf Code and data are here: >> http://emlab.berkeley.edu/~moretti/data3.html (I am using enricoall2.dta). >> >> I need to run a regression that fits a 4th degree polynomial separately >> for points of the running variable, x, below 0.5 and above 0.5. The >> regression includes a dummy variable for if x >= 0.5 or not as well. If >> there is a discontinuity at 0.5, then this is picked up in the coefficient >> on that dummy variable. >> >> In this case the running variable is the vote share that the Democratic >> candidate got in U.S. House of Representatives elections, including just >> the Democratic and Republican votes. So x < 0.5 means a Republican won, >> and >= 0.5 means a Democrat won. >> >> I would like to pool the data instead of running a separate regression for >> each side. This is one of the recommended methods in the RD literature. >> For some reason this method does not appear in the authors' code so I need >> to do it myself. >> >> I'm running and setting up the regression as follows: >> >> *** >> gen x = demvoteshare >> >> gen D = 1 if x >=0.5 >> replace D = 0 if x < 0.5 >> >> *Left Side Polynomial >> gen xa = (1-D)*x >> gen x2a = (1-D)*x^2 >> gen x3a = (1-D)*x^3 >> gen x4a = (1-D)*x^4 >> >> *Right Side Polynomial >> gen xb = D*x >> gen x2b = D*x^2 >> gen x3b = D*x^3 >> gen x4b = D*x^4 >> >> regress realincome D xa x2a x3a x4a xb x2b x3b x4b >> >> *** >> >> Based on what the authors of the paper got, graphical analysis, and logic, >> there should be no jump in realincome at 0.5. There is no reason why >> income should be suddenly much different for districts that democrats just >> barely won or just barely lost. If it is, this invalidates the regression >> discontinuity design. So the coefficient on D should be statistically >> insignificant. However, I get the following results: >> >> ------------------------------------------------------------------------------ >> realincome | Coef. Std. Err. t P>|t| [95% Conf. >> Interval] >> -------------+---------------------------------------------------------------- >> D | 497414.5 94802.12 5.25 0.000 311589 >> 683240.1 >> xa | 34396.25 27783.67 1.24 0.216 -20063.66 >> 88856.17 >> x2a | -22571.61 234577.9 -0.10 0.923 -482377.5 >> 437234.3 >> x3a | -429659.3 655505.3 -0.66 0.512 -1714542 >> 855223.6 >> x4a | 667813.9 598416.4 1.12 0.264 -505166.7 >> 1840795 >> xb | -2805647 534665.3 -5.25 0.000 -3853667 >> -1757628 >> x2b | 5828381 1112850 5.24 0.000 3647038 >> 8009724 >> x3b | -5281210 1012800 -5.21 0.000 -7266441 >> -3295979 >> x4b | 1754682 339914.5 5.16 0.000 1088402 >> 2420963 >> _cons | 31536.64 501.1422 62.93 0.000 30554.33 >> 32518.95 >> ------------------------------------------------------------------------------ >> >> I have no idea why D is statistically significant, and why only the >> polynomial on the right side is statistically significant. This is not >> just a problem with this regression. I get messed up results for every >> regression I run that has a 4th degree polynomial on each side of 0.5. >> >> However, I do not get weird results like this when I use just one 4th >> degree polynomial (one for the entire thing) with the D dummy. >> >> Does anyone know what I am doing wrong? I have no idea but I have a >> feeling that i'm missing something obvious. >> >> Thank you very much for your time and consideration. >> >> -- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Polynomial Fitting and RD Design***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: Polynomial Fitting and RD Design***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: smcl compatibility** - Next by Date:
**st: RE: esttab question** - Previous by thread:
**Re: st: smcl compatibility** - Next by thread:
**Re: st: Polynomial Fitting and RD Design** - Index(es):