Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Get fitted values after locpoly (follow-up)

From   Austin Nichols <>
Subject   Re: st: Get fitted values after locpoly (follow-up)
Date   Wed, 21 Sep 2011 11:40:04 -0400

Partho Sarkar <> :
N.B. -lpoly- does produce an optimal bandwidth, defined in its
documentation, whereas -locpoly- (findit locpoly) uses an "entirely
inappropriate" default bandwidth, as it makes clear in its help file:

If width() is not specified,
then the "default" width is used; see [R] kdensity.
This default is entirely inappropriate for local polynomial smoothing.
        Roll your own.

Hence my quotes on "optimal" bandwidth below, which I realized needed

On Wed, Sep 21, 2011 at 11:34 AM, Austin Nichols
<> wrote:
> Partho Sarkar <> :
> You can certainly run -locpoly- (findit locpoly) or -lpoly- on a
> sample of randomly selected points, keep the "optimal" bandwidth
> chosen, and then reestimate using that bandwidth on the full sample,
> and predict out of sample as well.  But they do not do adaptive
> bandwidths, if that is what you had in mind.
> On Wed, Sep 21, 2011 at 11:25 AM, Partho Sarkar
> <> wrote:
>> Tania
>> I think I see where you are coming from, and so just a quick pointer:
>>  You are probably thinking in terms of  "kernel regression" (or local
>> polynomial regression) as usually understood in the machine learning
>> literature, in which the bandwidth is *optimally* selected (or
>> "tuned") from  an available "training set" or "memory set" of (xi,yi)
>> points, and *this bandwidth, together with the training set data*, can
>> then be used to "predict" the y0 value at some previously "query"
>> point x0 outside the training set.  [In a sense, you could say that
>> the training set together with the bandwidht constitute the "model"].
>> But this is clearly not how locpoly is set up.  The bandwidth is
>> fixed-either by default or your choice.  And I am not sure, having
>> only tried a canned example with the program once very briefly, if
>> there is any scope to meaningfully partition the data into training
>> and query sets, as I think you might have in mind.  The user interface
>> certainly does not *explicitly* give the user such a choice. [But this
>> can be clarified by those more familiar with this command.]  There may
>> be possibly be a roundabout way to get an approximation to what I
>> think you have in mind. But if I wanted to do the kind of kernel
>> regression I mention above, I would (without knowing what other Stata
>> programs may be available for this) go to R's CRAN archives.  I worked
>> on this a few years ago, so let me know and I could try to dig up
>> some of the sources, or just search CRAN.
>> Hope this helps
>> Partho

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index