Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Efficient coding with -replace-


From   "Elizabeth Allred" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: Efficient coding with -replace-
Date   Sun, 05 Oct 2008 14:38:43 -0400

More important than efficiency, I think, the do file is the document of your editing. The code referencing the id will be easy to understand when you look at it 6 months from now. I might even go one step further and include what you're changing FROM:

replace month = 1 if id==80 & month==4
replace year =  1996 if id==80 & year==1995
replace failed= 1 if id==80 & failed==.

Liz

>>> On 10/5/2008 at 12:22 PM, in message
<031173627889364697C50B3B266CBB8A01C08BB8@GEOGMAIL.geog.ad.dur.ac.uk>, "Nick
Cox" <[email protected]> wrote:
> Not so, or at least, it's more complicated than that. 
> 
> My short answer: On this information, Michael should leave his code as
> is. 
> 
> My longer answer: 
> 
> First of all, the indirection of using a local macro is more or less
> irrelevant to efficiency. In fact, if you recode as Martin suggested,
> the code will be a smidgen _slower_, as Stata is obliged to store the
> macro and then interpret it each time it is referenced. However, you
> would have to strain to tell the difference in timings. But remember:
> Stata is not a compiler! Interpretation always implies an overhead, just
> that in many cases it is negligible. 
> 
> On a style point, I would not use a local macro in this example. I can't
> see what real gain there is in terms of making the code more readable or
> comprehensible, setting aside the efficiency issue. 
> 
> On a larger issue, -if- is always less efficient than an equivalent -in-
> when there is a direct mapping between statements. What do I mean by
> that? 
> 
> Suppose you know that there is a single observation, say 5890, for which
> -id- is 80. 
> 
> Then you could and should code 
> 
> replace month = 1 in 5890
> replace year =  1996 in 5890
> replace failed= 1 in 5890
> 
> if efficiency were your only concern. Given a qualifier, -in 5890-,
> Stata goes straight there, does the work, and bails out. Given a
> qualifier, say -if id == 80-, Stata respects it the slow and stupid way
> and tests every observation to see whether that condition is true or
> false. (It never does the sort of smart thing that people are good at,
> such as noticing whenever observations are ordered by -id- and taking
> that into account.) So, for equivalent actions, -if- is much slower than
> -in-.
> 
> This principle is sometimes codified on Statalist, tongue in cheek, as
> Blasnik's Law, because Michael Blasnik has done more than anyone else to
> publicise it. 
> 
> However, 
> 
> 1. Efficiency should never be your only concern. Code with -if id == 80-
> is much more transparent than code with -in 5890-. Also, get the
> observation number wrong or mess up the sort order and you have
> introduced a hard-to-find bug. 
> 
> 2. The "suppose" is a big one. How do you find out the observation
> number if you don't know? You could do something like this 
> 
> gen long id = _n 
> su id if id == 80, meanonly  
> assert r(min) == r(max) 
> local where = r(min) 
> replace month = 1 in `where' 
> 
> etc. 
> 
> But you can see there is a trade-off here. You have to do more work
> beforehand to save work! In practice I would be most unlikely to bother.
> In general being clever like this will not help much and might involve
> extra work. Spending 2 minutes changing the code for 2 ms less machine
> time is usually dopey unless you know that you are going to use that
> code many, many times. 
> 
> 3. I've taken Michael literally in his implication that only a single
> observation is involved. The test above 
> 
> assert r(min) == r(max) 
> 
> tests whether that is so. 
> 
> At worst, the observations satisfying the -if- don't occur in a single
> block so that -in- is not applicable to the data as they stand. (In
> principle, that is always fixed by -sort-ing. Again in practice, there
> is a trade-off in that -sort-ing may take up considerable machine time
> itself.) 
> 
> Nick
> [email protected] 
> 
> (In a later post, Martin introduced what I think is another red herring
> by talking about dialogs. If you care about machine time, don't use
> dialogs.) 
> 
> Martin Weiss
> 
> -replace- expects "oldvar =exp", so no, I do not think there is a more
> efficient way. Multiple instances of the same -if- qualifier always make
> it
> advisable to throw it into a -local- 
> 
> local mycond " if id==80"
> replace month = 1 `mycond'
> replace year =  1996 `mycond'
> replace failed= 1 `mycond'
> 
> Michael McCulloch
> 
> As part of a data audit, I'm recording some changes in my project 
> do-file. Would there be a more efficient way to code the following 
> changes, all of which involve the same observation?
> 
> replace month = 1 if id==80
> replace year =  1996 if id==80
> replace failed= 1 if id==80
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search 
> *   http://www.stata.com/support/statalist/faq 
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index