Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: by vs while


From   Christopher Baum <baum@bc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: by vs while
Date   Sun, 27 Jun 2004 06:36:02 -0400

In a recent posting Subhankar said

The -by- command is so much faster than the -while- command...

If I compare

by month: regress returns factor

vs.

local i = 1
while i <= 1000 {
regress returns factor if `i' == month
local i = `i' + 1
}

I find that the -by- command is atleast 15-20 times faster than the -while-
loop.



The speed differential here has nothing to do with by vs while. The clumsy part of your code is the if i==month. Stata must examine EACH observation in the dataset for EVERY pass through this loop. Let us say that you know that there are a certain number of observations per month. Then replacing the if with an in first/last will speed this up immensely. If the number of obs per month is constant, then this could be done with a simple counter. If the number of obs per month varies, then it is worth it to pass through the dataset ONCE and set up two integer sequences containing the first and last obs for that month, and reference those in the in statement. That fix will, I imagine, remove most of the speed differential between these two methods.

Bottom line: in a large (esp. panel) dataset, never use the if qualifier--especially when you're doing some sort of loop over chunks of the data. It is horribly inefficient.

Kit
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index