# Re: st: weighting for Lowess smoothing

 From ymarchenko@stata.com (Yulia Marchenko, StataCorp) To statalist@hsphsun2.harvard.edu Subject Re: st: weighting for Lowess smoothing Date Tue, 02 May 2006 15:48:17 -0500

```Austin Nichols <austinnichols@gmail.com> demonstrates an example that compares
the speed of the -lowess- and -locpoly- commands:

> It may be that the labor-intesive part of the -lowess- command is compiled
> code (_LOWESS is built-in in both Stata 8 and Stata 9), but it does run a
> whole heck of a lot slower than -locpoly- (which is why I assumed it was
> interpreted code, I guess).  In this simple example, -lowess- takes more than
> 15 times as long (2 min versus 6 sec).

> clear
> sysuse auto
> replace wei=round(wei/10)
> expand wei
> set rmsg on
> locpoly price mpg, name(locpoly) width(2)
> lowess price mpg, name(lowess) bw(1)

The reason -lowess- is significantly slower than -locpoly- in this example is
due to the number of weighted regressions each command performs.  Note that by
default -locpoly- uses min(_N,50) equally spaced smoothing points whereas
-lowess- estimates the smooth at each value of the explanatory variable (mpg
in this example).  That is, the number of smoothing points is equal to _N, the
number of observations.  In the example above, while -locpoly- performs only
50 weighted regressions, -lowess- runs _N = 22344 of them.  It is difficult to
compare the speed of the two commands directly since each is using a different
weighting procedure.  However, the following gives a more clear picture.

clear
sysuse auto
replace wei=round(wei/10)
expand wei
keep if _n<1000
set rmsg on
locpoly price mpg, width(1) nograph at(mpg)
lowess price mpg, bw(1) nograph mean

On my computer I got the following results:

. locpoly price mpg, width(1) nograph at(mpg)
r; t=0.53 14:51:13

. lowess price mpg, bw(1) nograph mean
r; t=0.13 14:51:13

We can see now that -lowess- runs faster.  Note that by using the option
-at()- we request that -locpoly- evaluate the smooth at each value of the
variable mpg.  Therefore, each of the commands now performs the same number of
regressions.  Also, by default, -locpoly- performs local mean smoothing.  We
can use the option -mean- with -lowess- to request mean smoothing.  If graphs
are not needed, you can use -nograph- to save the time required to generate
graphs.

Both commands are using C code to perform regressions and the speed of each
depends heavily on the number of smoothing points.  If the dataset is large,
-lowess- will take a long time to run.  -locpoly- will run faster unless
-at()- is specified or a large number of smoothing points -n()- is requested.

-- Yulia
ymarchenko@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```