[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Martin Weiss" <martin.weiss1@gmx.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: RE: Efficient coding with -replace- |

Date |
Sun, 5 Oct 2008 21:09:09 +0200 |

Could you explain this a little more extensively, Liz? How do you know in advance what you are changing from? Might it not be different a couple of months down the road? Or are you hinting at something else? HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Elizabeth Allred Sent: Sunday, October 05, 2008 8:39 PM To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: Efficient coding with -replace- More important than efficiency, I think, the do file is the document of your editing. The code referencing the id will be easy to understand when you look at it 6 months from now. I might even go one step further and include what you're changing FROM: replace month = 1 if id==80 & month==4 replace year = 1996 if id==80 & year==1995 replace failed= 1 if id==80 & failed==. Liz >>> On 10/5/2008 at 12:22 PM, in message <031173627889364697C50B3B266CBB8A01C08BB8@GEOGMAIL.geog.ad.dur.ac.uk>, "Nick Cox" <n.j.cox@durham.ac.uk> wrote: > Not so, or at least, it's more complicated than that. > > My short answer: On this information, Michael should leave his code as > is. > > My longer answer: > > First of all, the indirection of using a local macro is more or less > irrelevant to efficiency. In fact, if you recode as Martin suggested, > the code will be a smidgen _slower_, as Stata is obliged to store the > macro and then interpret it each time it is referenced. However, you > would have to strain to tell the difference in timings. But remember: > Stata is not a compiler! Interpretation always implies an overhead, just > that in many cases it is negligible. > > On a style point, I would not use a local macro in this example. I can't > see what real gain there is in terms of making the code more readable or > comprehensible, setting aside the efficiency issue. > > On a larger issue, -if- is always less efficient than an equivalent -in- > when there is a direct mapping between statements. What do I mean by > that? > > Suppose you know that there is a single observation, say 5890, for which > -id- is 80. > > Then you could and should code > > replace month = 1 in 5890 > replace year = 1996 in 5890 > replace failed= 1 in 5890 > > if efficiency were your only concern. Given a qualifier, -in 5890-, > Stata goes straight there, does the work, and bails out. Given a > qualifier, say -if id == 80-, Stata respects it the slow and stupid way > and tests every observation to see whether that condition is true or > false. (It never does the sort of smart thing that people are good at, > such as noticing whenever observations are ordered by -id- and taking > that into account.) So, for equivalent actions, -if- is much slower than > -in-. > > This principle is sometimes codified on Statalist, tongue in cheek, as > Blasnik's Law, because Michael Blasnik has done more than anyone else to > publicise it. > > However, > > 1. Efficiency should never be your only concern. Code with -if id == 80- > is much more transparent than code with -in 5890-. Also, get the > observation number wrong or mess up the sort order and you have > introduced a hard-to-find bug. > > 2. The "suppose" is a big one. How do you find out the observation > number if you don't know? You could do something like this > > gen long id = _n > su id if id == 80, meanonly > assert r(min) == r(max) > local where = r(min) > replace month = 1 in `where' > > etc. > > But you can see there is a trade-off here. You have to do more work > beforehand to save work! In practice I would be most unlikely to bother. > In general being clever like this will not help much and might involve > extra work. Spending 2 minutes changing the code for 2 ms less machine > time is usually dopey unless you know that you are going to use that > code many, many times. > > 3. I've taken Michael literally in his implication that only a single > observation is involved. The test above > > assert r(min) == r(max) > > tests whether that is so. > > At worst, the observations satisfying the -if- don't occur in a single > block so that -in- is not applicable to the data as they stand. (In > principle, that is always fixed by -sort-ing. Again in practice, there > is a trade-off in that -sort-ing may take up considerable machine time > itself.) > > Nick > n.j.cox@durham.ac.uk > > (In a later post, Martin introduced what I think is another red herring > by talking about dialogs. If you care about machine time, don't use > dialogs.) > > Martin Weiss > > -replace- expects "oldvar =exp", so no, I do not think there is a more > efficient way. Multiple instances of the same -if- qualifier always make > it > advisable to throw it into a -local- > > local mycond " if id==80" > replace month = 1 `mycond' > replace year = 1996 `mycond' > replace failed= 1 `mycond' > > Michael McCulloch > > As part of a data audit, I'm recording some changes in my project > do-file. Would there be a more efficient way to code the following > changes, all of which involve the same observation? > > replace month = 1 if id==80 > replace year = 1996 if id==80 > replace failed= 1 if id==80 > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: RE: RE: Efficient coding with -replace-***From:*"Elizabeth Allred" <Lizard@hsph.harvard.edu>

**References**:**st: Efficient coding with -replace-***From:*Michael McCulloch <mm@pinest.org>

**st: RE: Efficient coding with -replace-***From:*"Martin Weiss" <martin.weiss1@gmx.de>

**st: RE: RE: Efficient coding with -replace-***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**st: RE: RE: Efficient coding with -replace-***From:*"Elizabeth Allred" <Lizard@hsph.harvard.edu>

- Prev by Date:
**Re: st: Mediating variables** - Next by Date:
**Re: st: impact of bootstrapping on predicted probabilities of logisticregression models** - Previous by thread:
**st: RE: RE: Efficient coding with -replace-** - Next by thread:
**st: RE: RE: RE: Efficient coding with -replace-** - Index(es):

© Copyright 1996–2023 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |