Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
László Sándor <sandorl@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-) |

Date |
Mon, 2 Apr 2012 17:23:44 -0400 |

Nick, thanks, I did follow up with your post. Sadly, I could not easily get -by- working, or to be precise, to use the variables that it generated. Below I have an attempt, if I can take liberty with your time and expect you to parse it, I am grateful for comments to get it working -- the indexing must be off. It tries to average two (x_r and y_r) or three (y2_r extra) variables. It generates too large values for some bins (i.e. from U[0,1] variables some averages become larger than 20.) I am happy if someone from StataCorp follows up too! :) Thanks, László tempvar wsum tag ones g byte `ones' = 1 if ("`y2_var'"!="") local y2 y2 else local y2 "" if ("`weight1'"!="") g `wsum' = sum(`weight1') if `touse' else g `wsum' = sum(`ones') if `touse' sort `x_q' by `x_q': g byte `tag' = _N if `touse' foreach v in x y `y2' { if "`weight1'"!=""{ by `x_q': g ``v'_mean' = sum(``v'_r'*`weight1') if `touse' by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse' } else { by `x_q': g ``v'_mean' = sum(``v'_r') if `touse' by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse' } } On Mon, Apr 2, 2012 at 3:36 PM, Nick Cox <njcoxstata@gmail.com> wrote: > > We are back to the questions you asked a week ago. Mostly this is for > StataCorp. Otherwise please see again my answers at > > http://www.stata.com/statalist/archive/2012-03/msg01144.html > > I've had dramatic speed-ups with Mata -- my record is reducing > execution time from 5 days to 2 minutes, but that was partly because > my original code was so dumb -- but I've not tried anything like the > stuff you were using. > > -tabulate, summarize- is compiled C code. I think the nearest you can > get is by using -by:- as explained in the post just quoted. > > Nick > > 2012/4/2 László Sándor <sandorl@gmail.com>: > > Hi all, > > > > I had several questions recently on this list about compiling Mata > > code. I still could not deal with generating the compile time locals > > with loops, but I typed them out and compiled. Now I had my test runs > > but they are surprising. Let me ask you why: > > > > My basic problem was to do a fast "collapse" to make binned scatter > > plots. Collapse was unacceptably slow, probably because of the > > necessary preserve-restore cycles, or inefficient coding of collapse > > (for its general purpose). > > > > I already had a version that parsed a log of -tabulate, summarize-. > > Yes, it is as much of a hack as it sounds like. I was not expecting > > this to be fast, at least because of the file I/O and the parsing. > > > > Now I built a Mata function that "collapses" into new variables with > > leaving the data intact otherwise. For this I used Ben Jann's > > -mf_mm_collapse-, and compiled all the necessary functions myself in > > the ado file. > > > > And the test run with 100 million observations told me it was slower > > than the hack. Before I give up and claim the hack unbeatable, I have > > one suspicion. I had the test run on Stata 12 MP on a cluster, with 12 > > cores. Perhaps -tabulate- used all of them, and my code did not. > > > > Are there guidelines how to speed up Mata in this situation (if it is > > not MP-aware to begin with?). > > > > Or, tentatively, can I ask for some guidance about the magic of > > -tabulate, summarize-? Is that magic accessible/reproducible without > > just logging its output? > > > > Thanks, > > > > Laszlo > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)***From:*László Sándor <sandorl@gmail.com>

**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: can you change the -collapse- default?** - Next by Date:
**Re: st: passing indefinite no of arguments** - Previous by thread:
**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)** - Next by thread:
**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)** - Index(es):