[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: egen and Mata |

Date |
Wed, 16 Jul 2008 09:25:00 -0500 |

Paulo Guimaraes <guimaraes@moore.sc.edu> reports, > [...] I tried to code in Mata the equivalent to the following Stata > command: > > . egen newvar=mean(var), by(id) His goal was to see if he could write code in Mata that would be faster then -egen- (an ado-file), and he failed. He made two attempts. He reported these results, and I have duplicated them: ---- Timings by ---- Ratio Guimaraes Gould Gould/Guimaraes ----------------------------------------------------------- egen 0.59 0.24 .41 Attempt 1 65.81 93.94 1.43 (1) Attempt 2 0.86 0.40 .47 ----------------------------------------------------------- Notes: 1. Gould's attempt to reproduce Attempt 1 also resulted in incorrect results. 2. Times are in seconds. 3. Evidently I (Gould) have a faster computer than Paulo, although mine is a middle of the road computer by modern standards. Mine is an Intel 2.13 GHz Core 2 Duo running 64-bit Stata under Linux. One of Paulo's questions was why Attempt 1 is running so slowly. I haven't looked into that because, in addition to running slowly, Attempt 1 is producing incorrect results on my computer, and I assume it is doing the same on Paulo's. The program has a bug. Attempt 2, on the other hand, works perfectly and, in fact, is running pretty quickly. Paulo finds Attempt 2 takes 1.46 times as long as egen, and on my comp[uter, the multiplier is 1.15. Paulo wants to know how speed up Attempt 2, too. Answer: I don't know how. I made two attempts and produced code that I considered prettier but in fact ran more slowly. Actually, I wasn't surprised by that. My expectation was based on a comparison of what is true and what is theoretically possible in comparisons of C and Mata. Mata produces P-code, which must be interpreted. C produces machine language. Mata's low-level functions (i=0, i++, etc.) run 2 to 4 times slower than compiled C code. Said differently, they run at an overhead of of 1:1 to 3:1. One-to-one (1:1) means that, to execute one machine instruction on behalf of the user, Mata executes one additional instruction just to figure out what needs to be done. Three-to-one means three additional instructions are executed to execute the one instruction on behalf of the user. 1:1 may seem like a theoretical lower bound, but it is not. Consider a high-level function such as matrix inversion. It might take the Mata P-code interpreter 5 or even 10 machine instructions to figure out what needs to be done, and then it would execute a million machine instructions that were needed for the problem, the same million instructions C would have to execute, too. Thus, the overhead would be .000005:1. But for very low level instructions for which there is a one-to-one matching with the instruction set on the chip, such as i++, 1:1 is a theoretical lower bound. So why not produce machine language rather than P-code? Because in that case, the compiled code would not be portable. All the above has to do with a comparison of Mata and C. Paulo is comparing Mata to ado-code, however. The ado-code to which Paulo is comparing, however, is not typical or rather, it is what should be typical when ado-code is used for the appropriate problem. In this case, most of the ado-code time is being spent in just a few highly efficient Stata commands which themselves are implemented in C. The problem Paulo set up, . egen newvar=mean(var), by(id) (1) could have been solved via . sort id . by id: gen newvar = sum(var)/_n (2) . by id: replace newvar = newvar[_N] On my computer, the time required for solution (1) was 0.24 seconds. Solution (2) would have taken 0.11 seconds. In fact, solution (2) is exactly what -egen- implements. So -egen- is running at 45.8% efficiency. What could have been done by Stata's C code in .11 seconds, took -egen- .24 seconds. As an aside, let me add that -sort-, -by id:-, -gen-, and -replace- are four features we have worked very hard on to make as efficient as possible. There are 4,901 lines of C code involved in implementing these functions for the problem given. In Paulo's Alternative 2, he wrote 27 lines of code, and on my computer, it ran in .47 seconds. Thus, the ratio is .47/.11 = 4.27. This is about the ratio we expect to see on low-level code. Paulo did as well as could be expected. Note well, when Mata is used on problems requiring matrix inversion or other heavy lifting, which includes implementation of statistical commands for which Mata was designed, this multiplier of 4.27 falls, and can even fall to 2 or even just a little more than 1. That is why we use Mata for our statistical development. The fact remains, however, that for problems where the solution involves only Mata's low-level features, we expect a multiplier of 4. So when I saw Paulo's question, I knew I was unlikely to have a better answer for him. Nonetheless, I tried. Here, for Paulo's information, was my final result: void function mean_by2(string scalar var, string scalar groupid) { real scalar i, j, j0, j1, sum, mean real colvector id, y, p, y2 transmoprhic info st_view(id, ., groupid) st_view(y, ., var) p = order(id, 1) info=panelsetup(my_y, 1) st_view(y2, ., st_addvar("double", "_y2")) for (i=1; i<=rows(info); i++) { j0 = info[i, 1] j1 = info[i, 2] sum = 0 mean = mean(my_y[|j0\j1|]) for (j=j0; j<=j1; j++) y2[p[j]] = mean } } My embarrassment is that this took 0.77 seconds to run on my computer, versus .47 seconds for Paulo's Attempt 2. I am nonetheless convinced that a combination of Paulo's code and mine could run faster. Who knows, we might even get down to .40 seconds from Paulo's .47. Paulo, my one comment about your Attempt 2 is putting all the data together in one matrix and then sorting it. In my attempt, I keep the data in separate vectors, and rather than using sort(), I use order() to obtain the permutation vector and then use it where I need it. My goal was to prevent coding things like data[,2], which requires exectracting the second column from a matrix, and will require extra time to execute. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: egen and Mata***From:*"Stas Kolenikov" <skolenik@gmail.com>

**Re: st: egen and Mata***From:*Paulo Guimaraes <guimaraes@moore.sc.edu>

**Re: st: egen and Mata***From:*David Airey <david.airey@Vanderbilt.Edu>

- Prev by Date:
**st: RE: Panel data and sparse data** - Next by Date:
**st: creating dummy variables out of categorical variables** - Previous by thread:
**Re: st: egen and Mata** - Next by thread:
**Re: st: egen and Mata** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |