Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: weighting for Lowess smoothing


From   ymarchenko@stata.com (Yulia Marchenko, StataCorp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: weighting for Lowess smoothing
Date   Tue, 02 May 2006 15:48:17 -0500

Austin Nichols <austinnichols@gmail.com> demonstrates an example that compares
the speed of the -lowess- and -locpoly- commands:

> It may be that the labor-intesive part of the -lowess- command is compiled
> code (_LOWESS is built-in in both Stata 8 and Stata 9), but it does run a
> whole heck of a lot slower than -locpoly- (which is why I assumed it was
> interpreted code, I guess).  In this simple example, -lowess- takes more than
> 15 times as long (2 min versus 6 sec).

> clear
> sysuse auto
> replace wei=round(wei/10)
> expand wei
> set rmsg on
> locpoly price mpg, name(locpoly) width(2)
> lowess price mpg, name(lowess) bw(1)

The reason -lowess- is significantly slower than -locpoly- in this example is
due to the number of weighted regressions each command performs.  Note that by
default -locpoly- uses min(_N,50) equally spaced smoothing points whereas
-lowess- estimates the smooth at each value of the explanatory variable (mpg
in this example).  That is, the number of smoothing points is equal to _N, the
number of observations.  In the example above, while -locpoly- performs only
50 weighted regressions, -lowess- runs _N = 22344 of them.  It is difficult to
compare the speed of the two commands directly since each is using a different
weighting procedure.  However, the following gives a more clear picture.

   clear
   sysuse auto
   replace wei=round(wei/10)
   expand wei
   keep if _n<1000
   set rmsg on
   locpoly price mpg, width(1) nograph at(mpg)
   lowess price mpg, bw(1) nograph mean

On my computer I got the following results:

   . locpoly price mpg, width(1) nograph at(mpg)
   r; t=0.53 14:51:13

   . lowess price mpg, bw(1) nograph mean
   r; t=0.13 14:51:13

We can see now that -lowess- runs faster.  Note that by using the option
-at()- we request that -locpoly- evaluate the smooth at each value of the
variable mpg.  Therefore, each of the commands now performs the same number of
regressions.  Also, by default, -locpoly- performs local mean smoothing.  We
can use the option -mean- with -lowess- to request mean smoothing.  If graphs
are not needed, you can use -nograph- to save the time required to generate
graphs.

Both commands are using C code to perform regressions and the speed of each
depends heavily on the number of smoothing points.  If the dataset is large,
-lowess- will take a long time to run.  -locpoly- will run faster unless
-at()- is specified or a large number of smoothing points -n()- is requested.


 -- Yulia
ymarchenko@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index