Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Polynomial Fitting and RD Design |

Date |
Thu, 1 Sep 2011 08:22:43 +0100 |

I've not read these papers, and it may be that replication here includes critique. A specific recommendation is to use -generate- powers as -double-s. Nick On Thu, Sep 1, 2011 at 7:37 AM, Nick Cox <njcoxstata@gmail.com> wrote: > Even if you can get this to work as intended, look at the sizes of > those coefficients! The resultant curve may look about right, but this > is a dubious thing to do numerically and statistically. I can't > comment on the underlying scientific rationale for quartics here, > although I will guess wildly that there isn't one. > > Nick > > On Thu, Sep 1, 2011 at 3:59 AM, Austin Nichols <austinnichols@gmail.com> wrote: >> Patrick Button <pbutton@uci.edu>: >> Try redefining your x so that the discontinuity is at zero. >> >> On Wed, Aug 31, 2011 at 9:54 PM, Patrick Button <pbutton@uci.edu> wrote: >>> Hello Stata users, >>> >>> I've been getting some unexpected Stata output when fitting polynomials >>> using a pretty simple OLS regression. >>> >>> I am replicating a regression discontinuity design paper (Lee, Moretti and >>> Butler 2004). The paper is here: >>> http://emlab.berkeley.edu/~moretti/final.pdf Code and data are here: >>> http://emlab.berkeley.edu/~moretti/data3.html (I am using enricoall2.dta). >>> >>> I need to run a regression that fits a 4th degree polynomial separately >>> for points of the running variable, x, below 0.5 and above 0.5. The >>> regression includes a dummy variable for if x >= 0.5 or not as well. If >>> there is a discontinuity at 0.5, then this is picked up in the coefficient >>> on that dummy variable. >>> >>> In this case the running variable is the vote share that the Democratic >>> candidate got in U.S. House of Representatives elections, including just >>> the Democratic and Republican votes. So x < 0.5 means a Republican won, >>> and >= 0.5 means a Democrat won. >>> >>> I would like to pool the data instead of running a separate regression for >>> each side. This is one of the recommended methods in the RD literature. >>> For some reason this method does not appear in the authors' code so I need >>> to do it myself. >>> >>> I'm running and setting up the regression as follows: >>> >>> *** >>> gen x = demvoteshare >>> >>> gen D = 1 if x >=0.5 >>> replace D = 0 if x < 0.5 >>> >>> *Left Side Polynomial >>> gen xa = (1-D)*x >>> gen x2a = (1-D)*x^2 >>> gen x3a = (1-D)*x^3 >>> gen x4a = (1-D)*x^4 >>> >>> *Right Side Polynomial >>> gen xb = D*x >>> gen x2b = D*x^2 >>> gen x3b = D*x^3 >>> gen x4b = D*x^4 >>> >>> regress realincome D xa x2a x3a x4a xb x2b x3b x4b >>> >>> *** >>> >>> Based on what the authors of the paper got, graphical analysis, and logic, >>> there should be no jump in realincome at 0.5. There is no reason why >>> income should be suddenly much different for districts that democrats just >>> barely won or just barely lost. If it is, this invalidates the regression >>> discontinuity design. So the coefficient on D should be statistically >>> insignificant. However, I get the following results: >>> >>> ------------------------------------------------------------------------------ >>> realincome | Coef. Std. Err. t P>|t| [95% Conf. >>> Interval] >>> -------------+---------------------------------------------------------------- >>> D | 497414.5 94802.12 5.25 0.000 311589 >>> 683240.1 >>> xa | 34396.25 27783.67 1.24 0.216 -20063.66 >>> 88856.17 >>> x2a | -22571.61 234577.9 -0.10 0.923 -482377.5 >>> 437234.3 >>> x3a | -429659.3 655505.3 -0.66 0.512 -1714542 >>> 855223.6 >>> x4a | 667813.9 598416.4 1.12 0.264 -505166.7 >>> 1840795 >>> xb | -2805647 534665.3 -5.25 0.000 -3853667 >>> -1757628 >>> x2b | 5828381 1112850 5.24 0.000 3647038 >>> 8009724 >>> x3b | -5281210 1012800 -5.21 0.000 -7266441 >>> -3295979 >>> x4b | 1754682 339914.5 5.16 0.000 1088402 >>> 2420963 >>> _cons | 31536.64 501.1422 62.93 0.000 30554.33 >>> 32518.95 >>> ------------------------------------------------------------------------------ >>> >>> I have no idea why D is statistically significant, and why only the >>> polynomial on the right side is statistically significant. This is not >>> just a problem with this regression. I get messed up results for every >>> regression I run that has a 4th degree polynomial on each side of 0.5. >>> >>> However, I do not get weird results like this when I use just one 4th >>> degree polynomial (one for the entire thing) with the D dummy. >>> >>> Does anyone know what I am doing wrong? I have no idea but I have a >>> feeling that i'm missing something obvious. >>> >>> Thank you very much for your time and consideration. >>> >>> -- > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Polynomial Fitting and RD Design***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**st: RE: esttab question** - Next by Date:
**Re: st: Polynomial Fitting and RD Design** - Previous by thread:
**Re: st: Polynomial Fitting and RD Design** - Next by thread:
**Re: st: Polynomial Fitting and RD Design** - Index(es):