Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "doing anything the quickest way does no harm"

From   "Nick Cox" <>
To   <>
Subject   Re: st: "doing anything the quickest way does no harm"
Date   Wed, 28 Apr 2010 19:39:05 +0100

Clyde omits the context, which does make a little difference to the
argument. The context is a thread in which I was recommending -count- in
preference to -summarize-, so we agree. The comment was meant as a
summary comment given three ways of doing things, which don't differ
much in clarity. (Please see below.) I've also written at least three
articles on the virtues of -count-. 

On the larger issue, I also agree with him: speed alone is far from the
issue and clarity is ultimately more important in any code you have to
read. I've said as much elsewhere in the Stata world, and rather
frequently too. That comment on speed is not one I want to see quoted
out of the context just given. 


Original posting at

Good question. I decided to do some timings to support -- or rebut -- my
feeling that -count- which just counts should be faster than -summarize,
meanonly- which does other stuff too and in turn than -summarize- which
does other stuff too. But although that's the order the timings are
closer than I guessed. Still, doing anything the quickest way does no
harm and may give valuable speed-up for large problems. 

Here is one test script. Compare your experiences: 

set obs 100000
set seed 2803
gen y = runiform()
set rmsg on

qui forval i = 1/10000 {
	count if y > 0.5

qui forval i = 1/10000 {
	su y if y > 0.5, meanonly

qui forval i = 1/10000 {
	su y if y > 0.5

My timings were t=187.49; 254.49; 313.38, which no doubt shows up the
Mesolithic age of my machine. 

Clyde Schechter 

Perhaps I am just a superennuated, grouchy programmer who suffered
the spaghetti code era of the 1960's and 1970's, still fighting the last
war. But I take issue with the statement "doing anything the quickest
does no harm."

I think there is a strong consensus in the field of computer programming
that clear, transparent code generally trumps speed gains accomplished
through obscure tweaks because, in the long run, being able to read and
easily maintain your code saves more time; and the human time so saved
more valuable than the computer time salvaged.  I think there is also
consensus that if your program really runs too slowly, you are usually
better off substituting a faster algorithm (e.g., use binary table
searches instead of linear ones), or moving to faster hardware.  You
probably get greater time savings those ways than by squeezing out a few
microseconds here and there with opaque tricks.  And when you come back
your code later, you'll understand what you did.

Now, in the particular thread that led to this quote, these
point in the same direction.  Using -count-, when what you want is a
and not other descriptive statistics, is better than using summarize
because it's more transparent: it means what it says.  Even if -count-
were slower than -summarize-, I would still use -count- there!  In the
rare event that one really needs to squeeze out a little extra time by
using an opaque command, at least one should put some comments in the
explaining what's going on.

I suppose I would have passed this over without comment were it not for
the fact that it came up on the same day (I read statalist-digest) when
somebody suggested using a global macro--a device that is potentially
dangerous and should be used very, very sparingly.

End of rant.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index