Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: lpoly and nonmissing fitted values where the dependent variable is missing

 From Alex Olssen To statalist@hsphsun2.harvard.edu Subject Re: st: lpoly and nonmissing fitted values where the dependent variable is missing Date Thu, 12 Aug 2010 11:10:23 +1000

```Thank you Yulia.

On 11 August 2010 03:55, Yulia Marchenko, StataCorp LP
<ymarchenko@stata.com> wrote:
>
>> Does anybody know how -lpoly- chooses how far to extend fitted values
>> outside the values used for estimation? I have a feeling this should be
>> related to bandwidth but it is not clear how or why.
>>
>> The following code looks like with the rectangle kernel and linear
>> regression -lpoly- estimates to the bwidth - 5 units outside the estimation
>> values.  This seems arbitrary though.  Is there a good reason?
>>
>> sysuse auto, clear
>> sort length
>> lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length)
>> lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length)
>> ...
>
> -lpoly- evaluates the smooth at each specified grid point.  In Alex's example,
> the grid points are determined by all the values of the -length- variable, as
> specified by the -at()- option.  The range of grid values for which -lpoly-
> reports a nonmissing smoothed value is not arbitrary and is determined by how
> many observations are available to perform a (local) regression fit at each
> grid point.
>
> For each grid point, the set of values to be used in a local regression fit is
> determined by the weights which represent the "nearness" of each observation
> to the target grid point: the "further" an observation is from the grid point
> the closer its weight to zero.  The weights are determined by both a specified
> bandwidth and a chosen kernel function; Alex can find more details about the
> actual computation in the Methods and Formulas section of the documentation
> entry for -lpoly-, -[R] lpoly- on p. 939.  Only the observations with nonzero
> weights are used in a local regression fit.  The fit is computed if there are
> at least two observations in a local region; otherwise, a missing value is
> returned.
>
> Returning to Alex's examples, with the rectangular kernel and the bandwidth
> equal to 10, the last grid point for which there are at least two observations
> in a local regression fit (taking into account the specified -if- restriction)
> is 195.  In the example with the same kernel and the bandwidth equal to 20,
> the last such grid point is 204.  As expected, holding everything else
> constant, increasing the bandwidth increases the range of grid values for
> which the smooth evaluates to a nonmissing value.
>
>
> -- Yulia
> ymarchenko@stata.com
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```