Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: RE: RE: RE: Efficient coding with -replace-


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: RE: RE: RE: Efficient coding with -replace-
Date   Sun, 5 Oct 2008 18:14:21 +0100

This was a point I tried to make in my first post. In other words, we
agree. 

Nick 
n.j.cox@durham.ac.uk 

Martin Weiss

If you take "efficiency" to denote "execution time", true. But if I say
that
I deal with problems where I remain well clear of Stata`s maximum
capabilities, but where consistency in terms of the -if- conditions is
important, then I find it "efficient" to code with a -macro- rather than
repeating the -if- clause over and over again, with the concomitant risk
of
mistakes increasing in the number of times you try to remember your -if-
clause correctly.

Nick Cox

I am not completely clear what extra point you are making here. I take
the main argument to be that you find putting a qualifier in a local
macro a way of avoiding bugs and keeping your code consistent. Fair
enough, but that has nothing at all to do with efficiency. Indeed, the
more often you do it, the more you make your code a little less
efficient -- in machine terms.  

Nick 
n.j.cox@durham.ac.uk 

Martin Weiss

Maybe I should not have digressed from the initial problem. In Michael`s
code, there is not much that needs to be done. I understood his question
as
implying that he wants to apply the same -if- statement multiple times.
In
his case, the statement is pretty easy, but concatenate a few conditions
and
you end up with a messy expression which is all too easily forgotten.
Then I
have always felt it is easiest to throw all those conditions into a
-macro-
(at the top of your do-file, at the start of a session) so half an hour
later they will appear as they did half an hour before, independently of
the
-sort- order.
On further consideration, I said that I would not include the -if- in
case I
want to use the dialog boxes to try out something new but with the same
conditions attached. Execution speed was not part of my recommendation,
true. Then again, I have never hit Stata`s limits in terms of speed when
using -replace-...

Nick Cox

Not so, or at least, it's more complicated than that. 

My short answer: On this information, Michael should leave his code as
is. 

My longer answer: 

First of all, the indirection of using a local macro is more or less
irrelevant to efficiency. In fact, if you recode as Martin suggested,
the code will be a smidgen _slower_, as Stata is obliged to store the
macro and then interpret it each time it is referenced. However, you
would have to strain to tell the difference in timings. But remember:
Stata is not a compiler! Interpretation always implies an overhead, just
that in many cases it is negligible. 

On a style point, I would not use a local macro in this example. I can't
see what real gain there is in terms of making the code more readable or
comprehensible, setting aside the efficiency issue. 

On a larger issue, -if- is always less efficient than an equivalent -in-
when there is a direct mapping between statements. What do I mean by
that? 

Suppose you know that there is a single observation, say 5890, for which
-id- is 80. 

Then you could and should code 

replace month = 1 in 5890
replace year =  1996 in 5890
replace failed= 1 in 5890

if efficiency were your only concern. Given a qualifier, -in 5890-,
Stata goes straight there, does the work, and bails out. Given a
qualifier, say -if id == 80-, Stata respects it the slow and stupid way
and tests every observation to see whether that condition is true or
false. (It never does the sort of smart thing that people are good at,
such as noticing whenever observations are ordered by -id- and taking
that into account.) So, for equivalent actions, -if- is much slower than
-in-.

This principle is sometimes codified on Statalist, tongue in cheek, as
Blasnik's Law, because Michael Blasnik has done more than anyone else to
publicise it. 

However, 

1. Efficiency should never be your only concern. Code with -if id == 80-
is much more transparent than code with -in 5890-. Also, get the
observation number wrong or mess up the sort order and you have
introduced a hard-to-find bug. 

2. The "suppose" is a big one. How do you find out the observation
number if you don't know? You could do something like this 

gen long id = _n 
su id if id == 80, meanonly  
assert r(min) == r(max) 
local where = r(min) 
replace month = 1 in `where' 

etc. 

But you can see there is a trade-off here. You have to do more work
beforehand to save work! In practice I would be most unlikely to bother.
In general being clever like this will not help much and might involve
extra work. Spending 2 minutes changing the code for 2 ms less machine
time is usually dopey unless you know that you are going to use that
code many, many times. 

3. I've taken Michael literally in his implication that only a single
observation is involved. The test above 

assert r(min) == r(max) 

tests whether that is so. 

At worst, the observations satisfying the -if- don't occur in a single
block so that -in- is not applicable to the data as they stand. (In
principle, that is always fixed by -sort-ing. Again in practice, there
is a trade-off in that -sort-ing may take up considerable machine time
itself.) 

Nick
n.j.cox@durham.ac.uk 

(In a later post, Martin introduced what I think is another red herring
by talking about dialogs. If you care about machine time, don't use
dialogs.) 

Martin Weiss

-replace- expects "oldvar =exp", so no, I do not think there is a more
efficient way. Multiple instances of the same -if- qualifier always make
it
advisable to throw it into a -local- 

local mycond " if id==80"
replace month = 1 `mycond'
replace year =  1996 `mycond'
replace failed= 1 `mycond'

Michael McCulloch

As part of a data audit, I'm recording some changes in my project 
do-file. Would there be a more efficient way to code the following 
changes, all of which involve the same observation?

replace month = 1 if id==80
replace year =  1996 if id==80
replace failed= 1 if id==80


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index