Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: lpoly and nonmissing fitted values where the dependent variable is missing


From   ymarchenko@stata.com (Yulia Marchenko, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: lpoly and nonmissing fitted values where the dependent variable is missing
Date   Tue, 10 Aug 2010 12:55:53 -0500

Alex Olssen <alex.olssen@gmail.com> has a follow up question about -lpoly-:

> Does anybody know how -lpoly- chooses how far to extend fitted values
> outside the values used for estimation? I have a feeling this should be
> related to bandwidth but it is not clear how or why.
>
> The following code looks like with the rectangle kernel and linear
> regression -lpoly- estimates to the bwidth - 5 units outside the estimation
> values.  This seems arbitrary though.  Is there a good reason?
> 
> sysuse auto, clear
> sort length
> lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length)
> lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length)
> ...

-lpoly- evaluates the smooth at each specified grid point.  In Alex's example,
the grid points are determined by all the values of the -length- variable, as
specified by the -at()- option.  The range of grid values for which -lpoly-
reports a nonmissing smoothed value is not arbitrary and is determined by how
many observations are available to perform a (local) regression fit at each
grid point.

For each grid point, the set of values to be used in a local regression fit is
determined by the weights which represent the "nearness" of each observation
to the target grid point: the "further" an observation is from the grid point
the closer its weight to zero.  The weights are determined by both a specified
bandwidth and a chosen kernel function; Alex can find more details about the
actual computation in the Methods and Formulas section of the documentation
entry for -lpoly-, -[R] lpoly- on p. 939.  Only the observations with nonzero
weights are used in a local regression fit.  The fit is computed if there are
at least two observations in a local region; otherwise, a missing value is
returned.

Returning to Alex's examples, with the rectangular kernel and the bandwidth
equal to 10, the last grid point for which there are at least two observations
in a local regression fit (taking into account the specified -if- restriction)
is 195.  In the example with the same kernel and the bandwidth equal to 20,
the last such grid point is 204.  As expected, holding everything else
constant, increasing the bandwidth increases the range of grid values for
which the smooth evaluates to a nonmissing value.


-- Yulia
ymarchenko@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index