Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: lpoly and nonmissing fitted values where the dependent variable is missing


From   ymarchenko@stata.com (Yulia Marchenko, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: lpoly and nonmissing fitted values where the dependent variable is missing
Date   Mon, 09 Aug 2010 11:09:14 -0500

Alex Olssen <alex.olssen@gmail.com> asks why -lpoly- produces smoothed values
outside the range of <x>-values (the variable -length- below) as defined by an
-if- statement:

> I am doing a regression discontinuity analysis and want to understand how
> -lpoly- is working.  I use the -lpoly- options -gen- and -at- to create
> fitted values for my local linear regression.  Due to the nature of
> regression discontinuity I look at two subgroups separately.  Fitted values
> are generated to observation that are even outside the subgroup.  I want to
> understand how it chooses where to fit them.
>
> For example,
>
> sysuse auto, clear
> lpoly price length if length<190, ker(rec) deg(1) bwidth(12) gen(L) at(length)
> sort length
> br L length
> 
> Cars with lengths up to 212cm long have fitted values.  Does anyone know why?
> 
> Note the if statement causes no problems.  If I gen lengthlt190=length if
> length<190 and then lpoly price lengthlt190 the results are identical.

-lpoly- uses two notions of a sample: an estimation sample and a grid sample.
An estimation sample defines a set of observations to be used in local
weighted linear regression fits.  A grid sample defines a set of grid points
at which the smooth will be evaluated.  To link this to the documentation
(-[R] lpoly-, pp. 939-940), the estimation sample defines the set of x_i's
used to compute regression coefficients in formula (2) in the documentation
and the grid sample defines the set of grid points x_o.

An -if- condition only affects the estimation sample and not the grid sample.
To restrict the range of grid points, Alex should create a new variable in the
desired range and use it in the -at()- option.  Continuing Alex's example, we
can use the -lengthlt190- variable in the -at()- option to restrict the range
of 'at' values to those less than 190:

  . sysuse auto, clear
  . gen lengthlt190=length if length<190 
  . lpoly price length if length<190, 	///
			ker(rec) deg(1) bwidth(12) gen(L) at(lengthlt190)


-- Yulia
ymarchenko@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index