Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: lpoly and nonmissing fitted values where the dependent variable is missing

 From Alex Olssen To statalist@hsphsun2.harvard.edu Subject Re: st: lpoly and nonmissing fitted values where the dependent variable is missing Date Tue, 10 Aug 2010 08:33:05 +1000

```Thanks Austin and Yulia for your helpful responses.

Sorry Austin, I was actually aware of your work and intended to
mention it but forgot to when I sat down to write the email.  It is

Kind regards,

Alex

On 10 August 2010 02:09, Yulia Marchenko, StataCorp LP
<ymarchenko@stata.com> wrote:
> Alex Olssen <alex.olssen@gmail.com> asks why -lpoly- produces smoothed values
> outside the range of <x>-values (the variable -length- below) as defined by an
> -if- statement:
>
>> I am doing a regression discontinuity analysis and want to understand how
>> -lpoly- is working.  I use the -lpoly- options -gen- and -at- to create
>> fitted values for my local linear regression.  Due to the nature of
>> regression discontinuity I look at two subgroups separately.  Fitted values
>> are generated to observation that are even outside the subgroup.  I want to
>> understand how it chooses where to fit them.
>>
>> For example,
>>
>> sysuse auto, clear
>> lpoly price length if length<190, ker(rec) deg(1) bwidth(12) gen(L) at(length)
>> sort length
>> br L length
>>
>> Cars with lengths up to 212cm long have fitted values.  Does anyone know why?
>>
>> Note the if statement causes no problems.  If I gen lengthlt190=length if
>> length<190 and then lpoly price lengthlt190 the results are identical.
>
> -lpoly- uses two notions of a sample: an estimation sample and a grid sample.
> An estimation sample defines a set of observations to be used in local
> weighted linear regression fits.  A grid sample defines a set of grid points
> at which the smooth will be evaluated.  To link this to the documentation
> (-[R] lpoly-, pp. 939-940), the estimation sample defines the set of x_i's
> used to compute regression coefficients in formula (2) in the documentation
> and the grid sample defines the set of grid points x_o.
>
> An -if- condition only affects the estimation sample and not the grid sample.
> To restrict the range of grid points, Alex should create a new variable in the
> desired range and use it in the -at()- option.  Continuing Alex's example, we
> can use the -lengthlt190- variable in the -at()- option to restrict the range
> of 'at' values to those less than 190:
>
>  . sysuse auto, clear
>  . gen lengthlt190=length if length<190
>  . lpoly price length if length<190,   ///
>                        ker(rec) deg(1) bwidth(12) gen(L) at(lengthlt190)
>
>
> -- Yulia
> ymarchenko@stata.com
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```