[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: Efficient coding with -replace-

From   "Martin Weiss" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: RE: Efficient coding with -replace-
Date   Sun, 5 Oct 2008 21:09:09 +0200

Could you explain this a little more extensively, Liz? How do you know in
advance what you are changing from? Might it not be different a couple of
months down the road? Or are you hinting at something else?


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Elizabeth Allred
Sent: Sunday, October 05, 2008 8:39 PM
To: [email protected]
Subject: st: RE: RE: Efficient coding with -replace-

More important than efficiency, I think, the do file is the document of your
editing. The code referencing the id will be easy to understand when you
look at it 6 months from now. I might even go one step further and include
what you're changing FROM:

replace month = 1 if id==80 & month==4
replace year =  1996 if id==80 & year==1995
replace failed= 1 if id==80 & failed==.


>>> On 10/5/2008 at 12:22 PM, in message
<>, "Nick
Cox" <[email protected]> wrote:
> Not so, or at least, it's more complicated than that. 
> My short answer: On this information, Michael should leave his code as
> is. 
> My longer answer: 
> First of all, the indirection of using a local macro is more or less
> irrelevant to efficiency. In fact, if you recode as Martin suggested,
> the code will be a smidgen _slower_, as Stata is obliged to store the
> macro and then interpret it each time it is referenced. However, you
> would have to strain to tell the difference in timings. But remember:
> Stata is not a compiler! Interpretation always implies an overhead, just
> that in many cases it is negligible. 
> On a style point, I would not use a local macro in this example. I can't
> see what real gain there is in terms of making the code more readable or
> comprehensible, setting aside the efficiency issue. 
> On a larger issue, -if- is always less efficient than an equivalent -in-
> when there is a direct mapping between statements. What do I mean by
> that? 
> Suppose you know that there is a single observation, say 5890, for which
> -id- is 80. 
> Then you could and should code 
> replace month = 1 in 5890
> replace year =  1996 in 5890
> replace failed= 1 in 5890
> if efficiency were your only concern. Given a qualifier, -in 5890-,
> Stata goes straight there, does the work, and bails out. Given a
> qualifier, say -if id == 80-, Stata respects it the slow and stupid way
> and tests every observation to see whether that condition is true or
> false. (It never does the sort of smart thing that people are good at,
> such as noticing whenever observations are ordered by -id- and taking
> that into account.) So, for equivalent actions, -if- is much slower than
> -in-.
> This principle is sometimes codified on Statalist, tongue in cheek, as
> Blasnik's Law, because Michael Blasnik has done more than anyone else to
> publicise it. 
> However, 
> 1. Efficiency should never be your only concern. Code with -if id == 80-
> is much more transparent than code with -in 5890-. Also, get the
> observation number wrong or mess up the sort order and you have
> introduced a hard-to-find bug. 
> 2. The "suppose" is a big one. How do you find out the observation
> number if you don't know? You could do something like this 
> gen long id = _n 
> su id if id == 80, meanonly  
> assert r(min) == r(max) 
> local where = r(min) 
> replace month = 1 in `where' 
> etc. 
> But you can see there is a trade-off here. You have to do more work
> beforehand to save work! In practice I would be most unlikely to bother.
> In general being clever like this will not help much and might involve
> extra work. Spending 2 minutes changing the code for 2 ms less machine
> time is usually dopey unless you know that you are going to use that
> code many, many times. 
> 3. I've taken Michael literally in his implication that only a single
> observation is involved. The test above 
> assert r(min) == r(max) 
> tests whether that is so. 
> At worst, the observations satisfying the -if- don't occur in a single
> block so that -in- is not applicable to the data as they stand. (In
> principle, that is always fixed by -sort-ing. Again in practice, there
> is a trade-off in that -sort-ing may take up considerable machine time
> itself.) 
> Nick
> [email protected] 
> (In a later post, Martin introduced what I think is another red herring
> by talking about dialogs. If you care about machine time, don't use
> dialogs.) 
> Martin Weiss
> -replace- expects "oldvar =exp", so no, I do not think there is a more
> efficient way. Multiple instances of the same -if- qualifier always make
> it
> advisable to throw it into a -local- 
> local mycond " if id==80"
> replace month = 1 `mycond'
> replace year =  1996 `mycond'
> replace failed= 1 `mycond'
> Michael McCulloch
> As part of a data audit, I'm recording some changes in my project 
> do-file. Would there be a more efficient way to code the following 
> changes, all of which involve the same observation?
> replace month = 1 if id==80
> replace year =  1996 if id==80
> replace failed= 1 if id==80
> *
> *   For searches and help try:
> * 
> * 
> *
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index