...

You can gain some speed in regular Stata code by not generating a separate variable just to count the number of non-missings:

bysort rep78: gen mean=sum(price)/sum(price<.)

by rep78: keep if _n==_N

On my machine, this reduces the time required for the corrected Stas code from 17.3 to 13.8 s.

Michael Blasnik

----- Original Message ----- From: "Sergiy Radyakin" <serjradyakin@gmail.com>

To: <statalist@hsphsun2.harvard.edu>

Sent: Friday, April 25, 2008 9:12 PM

Subject: Re: st: speed question: -collapse- vs -egen-

Hello All! Jeph has asked about an efficient way of creating a dataset with means of one variable over the categories of another variable. He suggested two possible solutions and Stas added a third one. Below I report performance of each of these methods and compare it with the fourth: a plugin. I use an expanded version of auto.dta and tabulate mean {price} by different levels of {rep78}. 1. All methods resulted in the following table of results* meanprice rep78 4564.5 1 5967.625 2 6429.233 3 6071.5 4 5913 5 2. The timing is as follows (Stata SE, Windows Server 2003, 32-bit) 1: 33.80 / 1 = 33.7960 2: 31.22 / 1 = 31.2190 3: 21.33 / 1 = 21.3280 4: 5.58 / 1 = 5.5780

