Re: st: lpoly and nonmissing fitted values where the dependent variable is missing

Tue, 10 Aug 2010 10:30:08 +1000

Dear Statlisters, I have a follow on question. Does anybody know how -lpoly- chooses how far to extend fitted values outside the values used for estimation? I have a feeling this should be related to bandwidth but it is not clear how or why. The following code looks like with the rectangle kernel and linear regression -lpoly- estimates to the bwidth - 5 units outside the estimation values. This seems arbitrary though. Is there a good reason? sysuse auto, clear sort length lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length) lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length) lpoly price length if length<190, ker(rec) deg(1) bwidth(30) nogr gen(L30) at(length) br L30 L20 L10 length Kind regards, Alex On 10 August 2010 08:33, Alex Olssen <alex.olssen@gmail.com> wrote: > Thanks Austin and Yulia for your helpful responses. > > Sorry Austin, I was actually aware of your work and intended to > mention it but forgot to when I sat down to write the email. It is > clear and very helpful. > > Kind regards, > > Alex > > > On 10 August 2010 02:09, Yulia Marchenko, StataCorp LP > <ymarchenko@stata.com> wrote: >> Alex Olssen <alex.olssen@gmail.com> asks why -lpoly- produces smoothed values >> outside the range of <x>-values (the variable -length- below) as defined by an >> -if- statement: >> >>> I am doing a regression discontinuity analysis and want to understand how >>> -lpoly- is working. I use the -lpoly- options -gen- and -at- to create >>> fitted values for my local linear regression. Due to the nature of >>> regression discontinuity I look at two subgroups separately. Fitted values >>> are generated to observation that are even outside the subgroup. I want to >>> understand how it chooses where to fit them. >>> >>> For example, >>> >>> sysuse auto, clear >>> lpoly price length if length<190, ker(rec) deg(1) bwidth(12) gen(L) at(length) >>> sort length >>> br L length >>> >>> Cars with lengths up to 212cm long have fitted values. Does anyone know why? >>> >>> Note the if statement causes no problems. If I gen lengthlt190=length if >>> length<190 and then lpoly price lengthlt190 the results are identical. >> >> -lpoly- uses two notions of a sample: an estimation sample and a grid sample. >> An estimation sample defines a set of observations to be used in local >> weighted linear regression fits. A grid sample defines a set of grid points >> at which the smooth will be evaluated. To link this to the documentation >> (-[R] lpoly-, pp. 939-940), the estimation sample defines the set of x_i's >> used to compute regression coefficients in formula (2) in the documentation >> and the grid sample defines the set of grid points x_o. >> >> An -if- condition only affects the estimation sample and not the grid sample. >> To restrict the range of grid points, Alex should create a new variable in the >> desired range and use it in the -at()- option. Continuing Alex's example, we >> can use the -lengthlt190- variable in the -at()- option to restrict the range >> of 'at' values to those less than 190: >> >> . sysuse auto, clear >> . gen lengthlt190=length if length<190 >> . lpoly price length if length<190, /// >> ker(rec) deg(1) bwidth(12) gen(L) at(lengthlt190) >> >> >> -- Yulia >> ymarchenko@stata.com >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

