Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Ariel Linden, DrPH" <ariel.linden@gmail.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
re:Re: st: Polynomial Fitting and RD Design |

Date |
Fri, 2 Sep 2011 10:19:33 -0400 |

I'd like to add a little more verbiage to Austin's points: Perhaps one of the most important developments in the RD design in the last decade, is to focus attention at the cutoff. In other words, we expect that if individuals cannot manipulate their assignment score, then individuals close to the cutoff on either side, should be comparable, and thus, the design is "as good as randomized". In order for this to be valid, the primary issue is determining the size of the neighborhood around the cutoff (where the model will be performed). As Austin points out, this is a function of both kernel (usually the triangle kernel is used in the RD context) and bandwidth (a lot of choices available for this, but the optimal bandwidth is an automated process that remove the burden from the researcher from testing many alternatives). So why is this important? Well, as we see from this thread, when one uses all the data (not restricted to a local neighborhood around the cutoff), then model fit away from the cutoff comes into play. This leads to the use of polynomials (and all the problems we see with that). Conversely, using a local neighborhood allows us to focus on the individuals who are most similar. As a complete aside, Maarten, I love the code you wrote... :-) Ariel Date: Thu, 1 Sep 2011 07:43:44 -0400 From: Austin Nichols <austinnichols@gmail.com> Subject: Re: st: Polynomial Fitting and RD Design Maarten-- Note that the standard for this design is local linear regression with a triangle (AKA edge) kernel, as implemented in -rd- on SSC. But the poster asked about replication, not an optimal design. On Thu, Sep 1, 2011 at 5:24 AM, Maarten Buis <maartenlbuis@gmail.com> wrote: > On Thu, Sep 1, 2011 at 10:31 AM, Nick Cox wrote: >> Sure, but that still leaves the non-numeric issues. I guess the main >> issue is just reproducing behaviour with smooth curves, but what >> arguments justify any kind of quartic here? > > No disagreement with you on that point. Actually I think that such > high degree polynomial is rather dangerous for this purpose as these > curves tend to move rather wildly away from the data at the extreme > ends of the curve, and in these models the break is such an extreme > end. As a consequence the break dummy may just capture this misfit to > the data rather than a real break. Patrick may want to consider a > fractional polynomial model instead. Below is an example on how to > estimate both models, the graph shows that the quartic curve does show > that wild behavior at the break, and the fractional polynomial model > shows that that is due to overfitting the curve as in this case two > linear curves will do just fine. > > *--------------- begin example ----------------- > sysuse uslifeexp, clear > drop if year == 1918 // Spanish flu pandemic > gen cyear = year - 1950 // center at break > > // 4th degree polynomial > orthpoly cyear , gen(oyear*) degree(4) > gen D = cyear > 0 if year < . > forvalues i = 1/4 { > gen oyear`i'l = (1-D)*oyear`i' > } > forvalues i = 1/4 { > gen oyear`i'r = D*oyear`i' > } > > // fit model > reg le oyear?? D > > // predict outcome > predict pol > > // fractional polynomial > gen cyearl = (1-D)*cyear > gen cyearr = D*cyear > > // fit model > mfp, df(8) : reg le cyearl cyearr D > > // predict outcome > predict mfp > > // Graph the models > twoway line le pol mfp year, /// > xline(1950) /// > lstyle(solid solid solid) /// > lcolor(black red blue) /// > legend(order( 1 "data" /// > 2 "quartic" /// > 3 "fractional" /// > "polynomial" )) > *---------------- end example ------------------ > (For more on examples I sent to the Statalist see: > http://www.maartenbuis.nl/example_faq ) > > Hope this helps, > Maarten > > -------------------------- > Maarten L. Buis > Institut fuer Soziologie > Universitaet Tuebingen > Wilhelmstrasse 36 > 72074 Tuebingen > Germany * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: Re: st: Polynomial Fitting and RD Design***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: Re: Question about ln-linear models** - Next by Date:
**Re: st: Package -ghansen- now available in SSC** - Previous by thread:
**Re: st: Polynomial Fitting and RD Design** - Next by thread:
**Re: Re: st: Polynomial Fitting and RD Design** - Index(es):