Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
ymarchenko@stata.com (Yulia Marchenko, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: lpoly and nonmissing fitted values where the dependent variable is missing |

Date |
Tue, 10 Aug 2010 12:55:53 -0500 |

Alex Olssen <alex.olssen@gmail.com> has a follow up question about -lpoly-: > Does anybody know how -lpoly- chooses how far to extend fitted values > outside the values used for estimation? I have a feeling this should be > related to bandwidth but it is not clear how or why. > > The following code looks like with the rectangle kernel and linear > regression -lpoly- estimates to the bwidth - 5 units outside the estimation > values. This seems arbitrary though. Is there a good reason? > > sysuse auto, clear > sort length > lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length) > lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length) > ... -lpoly- evaluates the smooth at each specified grid point. In Alex's example, the grid points are determined by all the values of the -length- variable, as specified by the -at()- option. The range of grid values for which -lpoly- reports a nonmissing smoothed value is not arbitrary and is determined by how many observations are available to perform a (local) regression fit at each grid point. For each grid point, the set of values to be used in a local regression fit is determined by the weights which represent the "nearness" of each observation to the target grid point: the "further" an observation is from the grid point the closer its weight to zero. The weights are determined by both a specified bandwidth and a chosen kernel function; Alex can find more details about the actual computation in the Methods and Formulas section of the documentation entry for -lpoly-, -[R] lpoly- on p. 939. Only the observations with nonzero weights are used in a local regression fit. The fit is computed if there are at least two observations in a local region; otherwise, a missing value is returned. Returning to Alex's examples, with the rectangular kernel and the bandwidth equal to 10, the last grid point for which there are at least two observations in a local regression fit (taking into account the specified -if- restriction) is 195. In the example with the same kernel and the bandwidth equal to 20, the last such grid point is 204. As expected, holding everything else constant, increasing the bandwidth increases the range of grid values for which the smooth evaluates to a nonmissing value. -- Yulia ymarchenko@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: lpoly and nonmissing fitted values where the dependent variable is missing***From:*Alex Olssen <alex.olssen@gmail.com>

- Prev by Date:
**Re: st: Differencing a varlist** - Next by Date:
**st: Splitting a file up into years, running analysis, and combining file again. Necessary due to a glitch in Stata command.** - Previous by thread:
**Re: st: lpoly and nonmissing fitted values where the dependent variable is missing** - Next by thread:
**Re: st: lpoly and nonmissing fitted values where the dependent variable is missing** - Index(es):