Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Get fitted values after locpoly (follow-up)


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Get fitted values after locpoly (follow-up)
Date   Wed, 21 Sep 2011 11:40:04 -0400

Partho Sarkar <partho.ss+lists@gmail.com> :
N.B. -lpoly- does produce an optimal bandwidth, defined in its
documentation, whereas -locpoly- (findit locpoly) uses an "entirely
inappropriate" default bandwidth, as it makes clear in its help file:

If width() is not specified,
then the "default" width is used; see [R] kdensity.
This default is entirely inappropriate for local polynomial smoothing.
        Roll your own.

Hence my quotes on "optimal" bandwidth below, which I realized needed
explanation.

On Wed, Sep 21, 2011 at 11:34 AM, Austin Nichols
<austinnichols@gmail.com> wrote:
> Partho Sarkar <partho.ss+lists@gmail.com> :
> You can certainly run -locpoly- (findit locpoly) or -lpoly- on a
> sample of randomly selected points, keep the "optimal" bandwidth
> chosen, and then reestimate using that bandwidth on the full sample,
> and predict out of sample as well.  But they do not do adaptive
> bandwidths, if that is what you had in mind.
>
> On Wed, Sep 21, 2011 at 11:25 AM, Partho Sarkar
> <partho.ss+lists@gmail.com> wrote:
>> Tania
>>
>> I think I see where you are coming from, and so just a quick pointer:
>>
>>  You are probably thinking in terms of  "kernel regression" (or local
>> polynomial regression) as usually understood in the machine learning
>> literature, in which the bandwidth is *optimally* selected (or
>> "tuned") from  an available "training set" or "memory set" of (xi,yi)
>> points, and *this bandwidth, together with the training set data*, can
>> then be used to "predict" the y0 value at some previously "query"
>> point x0 outside the training set.  [In a sense, you could say that
>> the training set together with the bandwidht constitute the "model"].
>>
>> But this is clearly not how locpoly is set up.  The bandwidth is
>> fixed-either by default or your choice.  And I am not sure, having
>> only tried a canned example with the program once very briefly, if
>> there is any scope to meaningfully partition the data into training
>> and query sets, as I think you might have in mind.  The user interface
>> certainly does not *explicitly* give the user such a choice. [But this
>> can be clarified by those more familiar with this command.]  There may
>> be possibly be a roundabout way to get an approximation to what I
>> think you have in mind. But if I wanted to do the kind of kernel
>> regression I mention above, I would (without knowing what other Stata
>> programs may be available for this) go to R's CRAN archives.  I worked
>> on this a few years ago, so let me know and I could try to dig up
>> some of the sources, or just search CRAN.
>>
>> Hope this helps
>>
>> Partho
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index