Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)
Date   Mon, 2 Apr 2012 23:11:22 +0100

I will look at it tomorrow.

2012/4/2 László Sándor <sandorl@gmail.com>:
> Nick,
>
> thanks, I did follow up with your post. Sadly, I could not easily get
> -by- working, or to be precise, to use the variables that it
> generated. Below I have an attempt, if I can take liberty with your
> time and expect you to parse it, I am grateful for comments to get it
> working -- the indexing must be off. It tries to average two (x_r and
> y_r) or three (y2_r extra) variables. It generates too large values
> for some bins (i.e. from U[0,1] variables some averages become larger
> than 20.)
>
> I am happy if someone from StataCorp follows up too! :)
>
> Thanks,
>
> László
>
> tempvar wsum tag ones
> g byte `ones' = 1
>
>
> if ("`y2_var'"!="") local y2 y2
> else local y2 ""
>
>
> if ("`weight1'"!="") g `wsum' = sum(`weight1')  if `touse'
> else g `wsum' = sum(`ones')  if `touse'
>
>
> sort `x_q'
> by `x_q': g byte `tag' = _N if `touse'
>
> foreach v in x y `y2' {
> if "`weight1'"!=""{
> by `x_q': g ``v'_mean' = sum(``v'_r'*`weight1')  if `touse'
> by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse'
> }
>
> else {
> by `x_q': g ``v'_mean' = sum(``v'_r') if `touse'
> by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse'
> }
> }
>
>
> On Mon, Apr 2, 2012 at 3:36 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>
>> We are back to the questions you asked a week ago. Mostly this is for
>> StataCorp. Otherwise please see again my answers at
>>
>> http://www.stata.com/statalist/archive/2012-03/msg01144.html
>>
>> I've had dramatic speed-ups with Mata -- my record is reducing
>> execution time from 5 days to 2 minutes, but that was partly because
>> my original code was so dumb -- but I've not tried anything like the
>> stuff you were using.
>>
>> -tabulate, summarize- is compiled C code. I think the nearest you can
>> get is by using -by:- as explained in the post just quoted.
>>
>> Nick
>>
>> 2012/4/2 László Sándor <sandorl@gmail.com>:
>> > Hi all,
>> >
>> > I had several questions recently on this list about compiling Mata
>> > code. I still could not deal with generating the compile time locals
>> > with loops, but I typed them out and compiled. Now I had my test runs
>> > but they are surprising. Let me ask you why:
>> >
>> > My basic problem was to do a fast "collapse" to make binned scatter
>> > plots. Collapse was unacceptably slow, probably because of the
>> > necessary preserve-restore cycles, or inefficient coding of collapse
>> > (for its general purpose).
>> >
>> > I already had a version that parsed a log of -tabulate, summarize-.
>> > Yes, it is as much of a hack as it sounds like. I was not expecting
>> > this to be fast, at least because of the file I/O and the parsing.
>> >
>> > Now I built a Mata function that "collapses" into new variables with
>> > leaving the data intact otherwise. For this I used Ben Jann's
>> > -mf_mm_collapse-, and compiled all the necessary functions myself in
>> > the ado file.
>> >
>> > And the test run with 100 million observations told me it was slower
>> > than the hack. Before I give up and claim the hack unbeatable, I have
>> > one suspicion. I had the test run on Stata 12 MP on a cluster, with 12
>> > cores. Perhaps -tabulate- used all of them, and my code did not.
>> >
>> > Are there guidelines how to speed up Mata in this situation (if it is
>> > not MP-aware to begin with?).
>> >
>> > Or, tentatively, can I ask for some guidance about the magic of
>> > -tabulate, summarize-? Is that magic accessible/reproducible without
>> > just logging its output?
>> >
>> > Thanks,
>> >
>> > Laszlo
>> > *
>> > *   For searches and help try:
>> > *   http://www.stata.com/help.cgi?search
>> > *   http://www.stata.com/support/statalist/faq
>> > *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index