Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re:Re: st: Polynomial Fitting and RD Design


From   "Ariel Linden, DrPH" <ariel.linden@gmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   re:Re: st: Polynomial Fitting and RD Design
Date   Fri, 2 Sep 2011 10:19:33 -0400

I'd like to add a little more verbiage to Austin's points: 

Perhaps one of the most important developments in the RD design in the last
decade, is to focus attention at the cutoff. In other words, we expect that
if individuals cannot manipulate their assignment score, then individuals
close to the cutoff on either side, should be comparable, and thus, the
design is "as good as randomized".

In order for this to be valid, the primary issue is determining the size of
the neighborhood around the cutoff (where the model will be performed). As
Austin points out, this is a function of both kernel (usually the triangle
kernel is used in the RD context) and bandwidth (a lot of choices available
for this, but the optimal bandwidth is an automated process that remove the
burden from the researcher from testing many alternatives).

So why is this important? Well, as we see from this thread, when one uses
all the data (not restricted to a local neighborhood around the cutoff),
then model fit away from the cutoff comes into play. This leads to the use
of polynomials (and all the problems we see with that). Conversely, using a
local neighborhood allows us to focus on the individuals who are most
similar.

As a complete aside, Maarten, I love the code you wrote... :-)

Ariel   

Date: Thu, 1 Sep 2011 07:43:44 -0400
From: Austin Nichols <austinnichols@gmail.com>
Subject: Re: st: Polynomial Fitting and RD Design

Maarten--
Note that the standard for this design is local linear regression with
a triangle (AKA edge) kernel, as implemented in -rd- on SSC.  But the
poster asked about replication, not an optimal design.

On Thu, Sep 1, 2011 at 5:24 AM, Maarten Buis <maartenlbuis@gmail.com> wrote:
> On Thu, Sep 1, 2011 at 10:31 AM, Nick Cox wrote:
>> Sure, but that still leaves the non-numeric issues. I guess the main
>> issue is just reproducing behaviour with smooth curves, but what
>> arguments justify any kind of quartic here?
>
> No disagreement with you on that point. Actually I think that such
> high degree polynomial is rather dangerous for this purpose as these
> curves tend to move rather wildly away from the data at the extreme
> ends of the curve, and in these models the break is such an extreme
> end. As a consequence the break dummy may just capture this misfit to
> the data rather than a real break. Patrick may want to consider a
> fractional polynomial model instead. Below is an example on how to
> estimate both models, the graph shows that the quartic curve does show
> that wild behavior at the break, and the fractional polynomial model
> shows that that is due to overfitting the curve as in this case two
> linear curves will do just fine.
>
> *--------------- begin example -----------------
> sysuse uslifeexp, clear
> drop if year == 1918 // Spanish flu pandemic
> gen cyear = year - 1950 // center at break
>
> // 4th degree polynomial
> orthpoly cyear , gen(oyear*) degree(4)
> gen D = cyear > 0 if year < .
> forvalues i = 1/4 {
>        gen oyear`i'l = (1-D)*oyear`i'
> }
> forvalues i = 1/4 {
>        gen oyear`i'r = D*oyear`i'
> }
>
> // fit model
> reg le oyear?? D
>
> // predict outcome
> predict pol
>
> // fractional polynomial
> gen cyearl = (1-D)*cyear
> gen cyearr = D*cyear
>
> // fit model
> mfp, df(8) : reg le cyearl cyearr D
>
> // predict outcome
> predict mfp
>
> // Graph the models
> twoway line le pol mfp year,            ///
>       xline(1950)                      ///
>       lstyle(solid solid solid)        ///
>       lcolor(black red blue)           ///
>       legend(order( 1 "data"           ///
>                     2 "quartic"        ///
>                     3 "fractional"     ///
>                       "polynomial"  ))
> *---------------- end example ------------------
>  (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index