Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Polynomial Fitting and RD Design


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Polynomial Fitting and RD Design
Date   Thu, 1 Sep 2011 08:22:43 +0100

I've not read these papers, and it may be that replication here
includes critique.

A specific recommendation is to use -generate- powers as -double-s.

Nick

On Thu, Sep 1, 2011 at 7:37 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> Even if you can get this to work as intended, look at the sizes of
> those coefficients! The resultant curve may look about right, but this
> is a dubious thing to do numerically and statistically. I can't
> comment on the underlying scientific rationale for quartics here,
> although I will guess wildly that there isn't one.
>
> Nick
>
> On Thu, Sep 1, 2011 at 3:59 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>> Patrick Button <pbutton@uci.edu>:
>> Try redefining your x so that the discontinuity is at zero.
>>
>> On Wed, Aug 31, 2011 at 9:54 PM, Patrick Button <pbutton@uci.edu> wrote:
>>> Hello Stata users,
>>>
>>> I've been getting some unexpected Stata output when fitting polynomials
>>> using a pretty simple OLS regression.
>>>
>>> I am replicating a regression discontinuity design paper (Lee, Moretti and
>>> Butler 2004). The paper is here:
>>> http://emlab.berkeley.edu/~moretti/final.pdf Code and data are here:
>>> http://emlab.berkeley.edu/~moretti/data3.html (I am using enricoall2.dta).
>>>
>>> I need to run a regression that fits a 4th degree polynomial separately
>>> for points of the running variable, x, below 0.5 and above 0.5. The
>>> regression includes a dummy variable for if x >= 0.5 or not as well. If
>>> there is a discontinuity at 0.5, then this is picked up in the coefficient
>>> on that dummy variable.
>>>
>>> In this case the running variable is the vote share that the Democratic
>>> candidate got in U.S. House of Representatives elections, including just
>>> the Democratic and Republican votes. So x < 0.5 means a Republican won,
>>> and >= 0.5 means a Democrat won.
>>>
>>> I would like to pool the data instead of running a separate regression for
>>> each side. This is one of the recommended methods in the RD literature.
>>> For some reason this method does not appear in the authors' code so I need
>>> to do it myself.
>>>
>>> I'm running and setting up the regression as follows:
>>>
>>> ***
>>> gen x = demvoteshare
>>>
>>> gen D = 1 if x >=0.5
>>> replace D = 0 if x < 0.5
>>>
>>> *Left Side Polynomial
>>> gen xa = (1-D)*x
>>> gen x2a = (1-D)*x^2
>>> gen x3a = (1-D)*x^3
>>> gen x4a = (1-D)*x^4
>>>
>>> *Right Side Polynomial
>>> gen xb = D*x
>>> gen x2b = D*x^2
>>> gen x3b = D*x^3
>>> gen x4b = D*x^4
>>>
>>> regress realincome D xa x2a x3a x4a xb x2b x3b x4b
>>>
>>> ***
>>>
>>> Based on what the authors of the paper got, graphical analysis, and logic,
>>> there should be no jump in realincome at 0.5. There is no reason why
>>> income should be suddenly much different for districts that democrats just
>>> barely won or just barely lost. If it is, this invalidates the regression
>>> discontinuity design. So the coefficient on D should be statistically
>>> insignificant. However, I get the following results:
>>>
>>> ------------------------------------------------------------------------------
>>>  realincome |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>>> Interval]
>>> -------------+----------------------------------------------------------------
>>>           D |   497414.5   94802.12     5.25   0.000       311589
>>> 683240.1
>>>          xa |   34396.25   27783.67     1.24   0.216    -20063.66
>>> 88856.17
>>>         x2a |  -22571.61   234577.9    -0.10   0.923    -482377.5
>>> 437234.3
>>>         x3a |  -429659.3   655505.3    -0.66   0.512     -1714542
>>> 855223.6
>>>         x4a |   667813.9   598416.4     1.12   0.264    -505166.7
>>> 1840795
>>>          xb |   -2805647   534665.3    -5.25   0.000     -3853667
>>> -1757628
>>>         x2b |    5828381    1112850     5.24   0.000      3647038
>>> 8009724
>>>         x3b |   -5281210    1012800    -5.21   0.000     -7266441
>>> -3295979
>>>         x4b |    1754682   339914.5     5.16   0.000      1088402
>>> 2420963
>>>       _cons |   31536.64   501.1422    62.93   0.000     30554.33
>>> 32518.95
>>> ------------------------------------------------------------------------------
>>>
>>> I have no idea why D is statistically significant, and why only the
>>> polynomial on the right side is statistically significant. This is not
>>> just a problem with this regression. I get messed up results for every
>>> regression I run that has a 4th degree polynomial on each side of 0.5.
>>>
>>> However, I do not get weird results like this when I use just one 4th
>>> degree polynomial (one for the entire thing) with the D dummy.
>>>
>>> Does anyone know what I am doing wrong? I have no idea but I have a
>>> feeling that i'm missing something obvious.
>>>
>>> Thank you very much for your time and consideration.
>>>
>>> --
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index