Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)
Date   Mon, 2 Apr 2012 20:36:52 +0100

We are back to the questions you asked a week ago. Mostly this is for
StataCorp. Otherwise please see again my answers at

http://www.stata.com/statalist/archive/2012-03/msg01144.html

I've had dramatic speed-ups with Mata -- my record is reducing
execution time from 5 days to 2 minutes, but that was partly because
my original code was so dumb -- but I've not tried anything like the
stuff you were using.

-tabulate, summarize- is compiled C code. I think the nearest you can
get is by using -by:- as explained in the post just quoted.

Nick

2012/4/2 László Sándor <sandorl@gmail.com>:
> Hi all,
>
> I had several questions recently on this list about compiling Mata
> code. I still could not deal with generating the compile time locals
> with loops, but I typed them out and compiled. Now I had my test runs
> but they are surprising. Let me ask you why:
>
> My basic problem was to do a fast "collapse" to make binned scatter
> plots. Collapse was unacceptably slow, probably because of the
> necessary preserve-restore cycles, or inefficient coding of collapse
> (for its general purpose).
>
> I already had a version that parsed a log of -tabulate, summarize-.
> Yes, it is as much of a hack as it sounds like. I was not expecting
> this to be fast, at least because of the file I/O and the parsing.
>
> Now I built a Mata function that "collapses" into new variables with
> leaving the data intact otherwise. For this I used Ben Jann's
> -mf_mm_collapse-, and compiled all the necessary functions myself in
> the ado file.
>
> And the test run with 100 million observations told me it was slower
> than the hack. Before I give up and claim the hack unbeatable, I have
> one suspicion. I had the test run on Stata 12 MP on a cluster, with 12
> cores. Perhaps -tabulate- used all of them, and my code did not.
>
> Are there guidelines how to speed up Mata in this situation (if it is
> not MP-aware to begin with?).
>
> Or, tentatively, can I ask for some guidance about the magic of
> -tabulate, summarize-? Is that magic accessible/reproducible without
> just logging its output?
>
> Thanks,
>
> Laszlo
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index