Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

RE: st: RD Optimal Bandwidth Algorithm sensitive to scaling?

 From "Schaffer, Mark E" To Subject RE: st: RD Optimal Bandwidth Algorithm sensitive to scaling? Date Fri, 15 Jul 2011 19:54:34 +0100

```There is something peculiar going on here...

When I try to replicate Chris' example but using the sample votex
dataset Austin provides with -rd-, I get no sensitivity to scaling.  But
when I do it using the auto dataset as Chris does, I get the same
sensitivity to scaling that he does.  In fact, if price is rescaled by a
factor of 1,000,000 instead of Chris' 1,000, -rd- exits with an
"insufficient observations" error!  Very curious....

--Mark

**************************************

votex example:

use votex, clear
gen double LNE=lne/1000
sum lne LNE d
rd lne d, mbw(100)
rd LNE d, mbw(100)

Output:

. sum lne LNE d

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
lne |       349    21.32478    .4329206   19.65047    23.1144
LNE |       349    .0213248    .0004329   .0196505   .0231144
d |       349    .0502933    .1604194  -.2756163   .4696784

. rd lne d, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.

Assignment variable Z is d
Treatment variable X_T unspecified
Outcome variable y is lne

Estimating for bandwidth .29287775925349
------------------------------------------------------------------------
------
lne |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald |  -.0773955   .1056062    -0.73   0.464      -.28438
.1295889
------------------------------------------------------------------------
------

. rd LNE d, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.

Assignment variable Z is d
Treatment variable X_T unspecified
Outcome variable y is LNE

Estimating for bandwidth .2928777592534422
------------------------------------------------------------------------
------
LNE |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald |  -.0000774   .0001056    -0.73   0.464    -.0002844
.0001296

**************************************

auto example:

Code:

sysuse auto, clear
gen double Price = price/1000
gen double PRICE = price/1000000
gen double z = length - 193
sum price Price PRICE z
rd price z, mbw(100)
rd Price z, mbw(100)
rd PRICE z, mbw(100)

Output:

. sum price Price PRICE z

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
price |        74    6165.257    2949.496       3291      15906
Price |        74    6.165257    2.949496      3.291     15.906
PRICE |        74    .0061653    .0029495    .003291    .015906
z |        74   -5.067568    22.26634        -51         40

. rd price z, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.

Assignment variable Z is z
Treatment variable X_T unspecified
Outcome variable y is price

Estimating for bandwidth 24.98807626042474
------------------------------------------------------------------------
------
price |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald |   -5198.13   2230.786    -2.33   0.020    -9570.391
-825.8697
------------------------------------------------------------------------
------

. rd Price z, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.

Assignment variable Z is z
Treatment variable X_T unspecified
Outcome variable y is Price

Estimating for bandwidth 8.731619909031293
------------------------------------------------------------------------
------
Price |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
lwald |  -7.547781   2.493275    -3.03   0.002    -12.43451
-2.661051
------------------------------------------------------------------------
------

. rd PRICE z, mbw(100)
Two variables specified; treatment is
assumed to jump from zero to one at Z=0.

Assignment variable Z is z
Treatment variable X_T unspecified
Outcome variable y is PRICE

insufficient observations
r(2001);

**************************************

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of
> Austin Nichols
> Sent: 15 July 2011 15:23
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: RD Optimal Bandwidth Algorithm sensitive to scaling?
>
> Chris--
> I agree it is an undesirable "feature" of the optimal
> bandwidth calculation, but some problem of this sort is
> probably unavoidable--in this case it arises from estimating
> local curvature using squared deviations of the outcome,
> which is evidently sensitive to scale.
> There are alternative approaches which would not face this
> exact problem, but there would almost surely be other
> problems, or other ways of breaking the estimator.  The
> sensitivity of bandwidth to scale is particularly
> undesirable, but also serves to illustrate what I have said
> elsewhere: bandwidth selection is more art than science, and
> at a minimum you should assess the sensitivity of your
> estimates to bandwidth, which is why graphs for multiple
> bandwidths are produced by default in -rd-, and there is an
> option -bdep- to assess the dependence graphically.
>
> On Fri, Jul 15, 2011 at 9:45 AM, Stata Chris
> <statachris@gmail.com> wrote:
> > Dear list members,
> >
> > I am using Austin Nichols' -rd-
> > (http://ideas.repec.org/c/boc/bocode/s456888.html) command,
> as well as
> > the related -rdob- by Fuji-Imbens-Kalyanaraman-Fuji
> > (http://www.economics.harvard.edu/faculty/imbens/software_imbens)
> >
> > Now I've discovered that the optimal bandwidth chosen and hence the
> > resulting estimates are sensitive to the scaling of the
> outcome variable.
> > To demonstrate this, I make use of an example discussed in this
> > context in an earlier post:
> >
> > sysuse auto, clear
> > gen Price = price/1000
> > gen z = length - 193
> > rd price z
> > rd Price z
> >
> >
> > As you can check, when I use as outcome the price in 1000 dollars
> > ("Price") rather than in dollars ("price"), I get a different
> > bandwidth and hence a very different estimate, whereas I
> think I would
> > wish to get the previous estimate just divided by 1000.
> >
> > This does not seem a very desirable property to me, but I'm
> not sure
> > where in the optimal bandwidth algorithm (see
> > http://www.nber.org/papers/w14726 ) this comes from and whether it
> > would be possible to avoid this. Probably some of you can say more
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```