Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Alex Olssen <alex.olssen@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: lpoly and nonmissing fitted values where the dependent variable is missing |

Date |
Thu, 12 Aug 2010 11:10:23 +1000 |

Thank you Yulia. On 11 August 2010 03:55, Yulia Marchenko, StataCorp LP <ymarchenko@stata.com> wrote: > Alex Olssen <alex.olssen@gmail.com> has a follow up question about -lpoly-: > >> Does anybody know how -lpoly- chooses how far to extend fitted values >> outside the values used for estimation? I have a feeling this should be >> related to bandwidth but it is not clear how or why. >> >> The following code looks like with the rectangle kernel and linear >> regression -lpoly- estimates to the bwidth - 5 units outside the estimation >> values. This seems arbitrary though. Is there a good reason? >> >> sysuse auto, clear >> sort length >> lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length) >> lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length) >> ... > > -lpoly- evaluates the smooth at each specified grid point. In Alex's example, > the grid points are determined by all the values of the -length- variable, as > specified by the -at()- option. The range of grid values for which -lpoly- > reports a nonmissing smoothed value is not arbitrary and is determined by how > many observations are available to perform a (local) regression fit at each > grid point. > > For each grid point, the set of values to be used in a local regression fit is > determined by the weights which represent the "nearness" of each observation > to the target grid point: the "further" an observation is from the grid point > the closer its weight to zero. The weights are determined by both a specified > bandwidth and a chosen kernel function; Alex can find more details about the > actual computation in the Methods and Formulas section of the documentation > entry for -lpoly-, -[R] lpoly- on p. 939. Only the observations with nonzero > weights are used in a local regression fit. The fit is computed if there are > at least two observations in a local region; otherwise, a missing value is > returned. > > Returning to Alex's examples, with the rectangular kernel and the bandwidth > equal to 10, the last grid point for which there are at least two observations > in a local regression fit (taking into account the specified -if- restriction) > is 195. In the example with the same kernel and the bandwidth equal to 20, > the last such grid point is 204. As expected, holding everything else > constant, increasing the bandwidth increases the range of grid values for > which the smooth evaluates to a nonmissing value. > > > -- Yulia > ymarchenko@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: lpoly and nonmissing fitted values where the dependent variable is missing***From:*ymarchenko@stata.com (Yulia Marchenko, StataCorp LP)

- Prev by Date:
**st: Question regarding GLLAMM** - Next by Date:
**st: RE: RE: ADO file to graph using features of the default scheme** - Previous by thread:
**Re: st: lpoly and nonmissing fitted values where the dependent variable is missing** - Next by thread:
**st: Question about scalars** - Index(es):