Thank you Yulia. On 11 August 2010 03:55, Yulia Marchenko, StataCorp LP <ymarchenko@stata.com> wrote: > Alex Olssen <alex.olssen@gmail.com> has a follow up question about -lpoly-: > >> Does anybody know how -lpoly- chooses how far to extend fitted values >> outside the values used for estimation? I have a feeling this should be >> related to bandwidth but it is not clear how or why. >> >> The following code looks like with the rectangle kernel and linear >> regression -lpoly- estimates to the bwidth - 5 units outside the estimation >> values. This seems arbitrary though. Is there a good reason? >> >> sysuse auto, clear >> sort length >> lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length) >> lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length) >> ... > > -lpoly- evaluates the smooth at each specified grid point. In Alex's example, > the grid points are determined by all the values of the -length- variable, as > specified by the -at()- option. The range of grid values for which -lpoly- > reports a nonmissing smoothed value is not arbitrary and is determined by how > many observations are available to perform a (local) regression fit at each > grid point. > > For each grid point, the set of values to be used in a local regression fit is > determined by the weights which represent the "nearness" of each observation > to the target grid point: the "further" an observation is from the grid point > the closer its weight to zero. The weights are determined by both a specified > bandwidth and a chosen kernel function; Alex can find more details about the > actual computation in the Methods and Formulas section of the documentation > entry for -lpoly-, -[R] lpoly- on p. 939. Only the observations with nonzero > weights are used in a local regression fit. The fit is computed if there are > at least two observations in a local region; otherwise, a missing value is > returned. > > Returning to Alex's examples, with the rectangular kernel and the bandwidth > equal to 10, the last grid point for which there are at least two observations > in a local regression fit (taking into account the specified -if- restriction) > is 195. In the example with the same kernel and the bandwidth equal to 20, > the last such grid point is 204. As expected, holding everything else > constant, increasing the bandwidth increases the range of grid values for > which the smooth evaluates to a nonmissing value. > > > -- Yulia > ymarchenko@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

