Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: lpoly and nonmissing fitted values where the dependent variable is missing

 From ymarchenko@stata.com (Yulia Marchenko, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: lpoly and nonmissing fitted values where the dependent variable is missing Date Tue, 10 Aug 2010 12:55:53 -0500

```Alex Olssen <alex.olssen@gmail.com> has a follow up question about -lpoly-:

> Does anybody know how -lpoly- chooses how far to extend fitted values
> outside the values used for estimation? I have a feeling this should be
> related to bandwidth but it is not clear how or why.
>
> The following code looks like with the rectangle kernel and linear
> regression -lpoly- estimates to the bwidth - 5 units outside the estimation
> values.  This seems arbitrary though.  Is there a good reason?
>
> sysuse auto, clear
> sort length
> lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length)
> lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length)
> ...

-lpoly- evaluates the smooth at each specified grid point.  In Alex's example,
the grid points are determined by all the values of the -length- variable, as
specified by the -at()- option.  The range of grid values for which -lpoly-
reports a nonmissing smoothed value is not arbitrary and is determined by how
many observations are available to perform a (local) regression fit at each
grid point.

For each grid point, the set of values to be used in a local regression fit is
determined by the weights which represent the "nearness" of each observation
to the target grid point: the "further" an observation is from the grid point
the closer its weight to zero.  The weights are determined by both a specified
bandwidth and a chosen kernel function; Alex can find more details about the
actual computation in the Methods and Formulas section of the documentation
entry for -lpoly-, -[R] lpoly- on p. 939.  Only the observations with nonzero
weights are used in a local regression fit.  The fit is computed if there are
at least two observations in a local region; otherwise, a missing value is
returned.

Returning to Alex's examples, with the rectangular kernel and the bandwidth
equal to 10, the last grid point for which there are at least two observations
in a local regression fit (taking into account the specified -if- restriction)
is 195.  In the example with the same kernel and the bandwidth equal to 20,
the last such grid point is 204.  As expected, holding everything else
constant, increasing the bandwidth increases the range of grid values for
which the smooth evaluates to a nonmissing value.

-- Yulia
ymarchenko@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```