Re: st: lpoly and nonmissing fitted values where the dependent variable is missing

 From Alex Olssen To statalist@hsphsun2.harvard.edu Subject Re: st: lpoly and nonmissing fitted values where the dependent variable is missing Date Thu, 12 Aug 2010 11:10:23 +1000

```Thank you Yulia.

On 11 August 2010 03:55, Yulia Marchenko, StataCorp LP
<ymarchenko@stata.com> wrote:
>> Does anybody know how -lpoly- chooses how far to extend fitted values
>> outside the values used for estimation? I have a feeling this should be
>> related to bandwidth but it is not clear how or why.
>> The following code looks like with the rectangle kernel and linear
>> regression -lpoly- estimates to the bwidth - 5 units outside the estimation
>> values.  This seems arbitrary though.  Is there a good reason?
>> sysuse auto, clear
>> sort length
>> lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length)
>> lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length)
> -lpoly- evaluates the smooth at each specified grid point.  In Alex's example,
> the grid points are determined by all the values of the -length- variable, as
> specified by the -at()- option.  The range of grid values for which -lpoly-
> reports a nonmissing smoothed value is not arbitrary and is determined by how
> many observations are available to perform a (local) regression fit at each
> grid point.
> For each grid point, the set of values to be used in a local regression fit is
> determined by the weights which represent the "nearness" of each observation
> to the target grid point: the "further" an observation is from the grid point
> the closer its weight to zero.  The weights are determined by both a specified
> bandwidth and a chosen kernel function; Alex can find more details about the
> actual computation in the Methods and Formulas section of the documentation
> entry for -lpoly-, -[R] lpoly- on p. 939.  Only the observations with nonzero
> weights are used in a local regression fit.  The fit is computed if there are
> at least two observations in a local region; otherwise, a missing value is
> returned.
> Returning to Alex's examples, with the rectangular kernel and the bandwidth
> equal to 10, the last grid point for which there are at least two observations
> in a local regression fit (taking into account the specified -if- restriction)
> is 195.  In the example with the same kernel and the bandwidth equal to 20,
> the last such grid point is 204.  As expected, holding everything else
> constant, increasing the bandwidth increases the range of grid values for
> which the smooth evaluates to a nonmissing value.
> -- Yulia
> ymarchenko@stata.com
