Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-) |

Date |
Mon, 2 Apr 2012 23:11:22 +0100 |

I will look at it tomorrow. 2012/4/2 László Sándor <sandorl@gmail.com>: > Nick, > > thanks, I did follow up with your post. Sadly, I could not easily get > -by- working, or to be precise, to use the variables that it > generated. Below I have an attempt, if I can take liberty with your > time and expect you to parse it, I am grateful for comments to get it > working -- the indexing must be off. It tries to average two (x_r and > y_r) or three (y2_r extra) variables. It generates too large values > for some bins (i.e. from U[0,1] variables some averages become larger > than 20.) > > I am happy if someone from StataCorp follows up too! :) > > Thanks, > > László > > tempvar wsum tag ones > g byte `ones' = 1 > > > if ("`y2_var'"!="") local y2 y2 > else local y2 "" > > > if ("`weight1'"!="") g `wsum' = sum(`weight1') if `touse' > else g `wsum' = sum(`ones') if `touse' > > > sort `x_q' > by `x_q': g byte `tag' = _N if `touse' > > foreach v in x y `y2' { > if "`weight1'"!=""{ > by `x_q': g ``v'_mean' = sum(``v'_r'*`weight1') if `touse' > by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse' > } > > else { > by `x_q': g ``v'_mean' = sum(``v'_r') if `touse' > by `x_q': replace ``v'_mean' = ``v'_mean'/`wsum' if `tag' & `touse' > } > } > > > On Mon, Apr 2, 2012 at 3:36 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> >> We are back to the questions you asked a week ago. Mostly this is for >> StataCorp. Otherwise please see again my answers at >> >> http://www.stata.com/statalist/archive/2012-03/msg01144.html >> >> I've had dramatic speed-ups with Mata -- my record is reducing >> execution time from 5 days to 2 minutes, but that was partly because >> my original code was so dumb -- but I've not tried anything like the >> stuff you were using. >> >> -tabulate, summarize- is compiled C code. I think the nearest you can >> get is by using -by:- as explained in the post just quoted. >> >> Nick >> >> 2012/4/2 László Sándor <sandorl@gmail.com>: >> > Hi all, >> > >> > I had several questions recently on this list about compiling Mata >> > code. I still could not deal with generating the compile time locals >> > with loops, but I typed them out and compiled. Now I had my test runs >> > but they are surprising. Let me ask you why: >> > >> > My basic problem was to do a fast "collapse" to make binned scatter >> > plots. Collapse was unacceptably slow, probably because of the >> > necessary preserve-restore cycles, or inefficient coding of collapse >> > (for its general purpose). >> > >> > I already had a version that parsed a log of -tabulate, summarize-. >> > Yes, it is as much of a hack as it sounds like. I was not expecting >> > this to be fast, at least because of the file I/O and the parsing. >> > >> > Now I built a Mata function that "collapses" into new variables with >> > leaving the data intact otherwise. For this I used Ben Jann's >> > -mf_mm_collapse-, and compiled all the necessary functions myself in >> > the ado file. >> > >> > And the test run with 100 million observations told me it was slower >> > than the hack. Before I give up and claim the hack unbeatable, I have >> > one suspicion. I had the test run on Stata 12 MP on a cluster, with 12 >> > cores. Perhaps -tabulate- used all of them, and my code did not. >> > >> > Are there guidelines how to speed up Mata in this situation (if it is >> > not MP-aware to begin with?). >> > >> > Or, tentatively, can I ask for some guidance about the magic of >> > -tabulate, summarize-? Is that magic accessible/reproducible without >> > just logging its output? >> > >> > Thanks, >> > >> > Laszlo >> > * >> > * For searches and help try: >> > * http://www.stata.com/help.cgi?search >> > * http://www.stata.com/support/statalist/faq >> > * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)***From:*László Sándor <sandorl@gmail.com>

**References**:**st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)***From:*László Sándor <sandorl@gmail.com>

**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)***From:*László Sándor <sandorl@gmail.com>

- Prev by Date:
**Re: st: passing indefinite no of arguments** - Next by Date:
**st: multiple merging** - Previous by thread:
**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)** - Next by thread:
**Re: st: how to parallelize Mata (or steal the performance of built-in -tab, summarize-)** - Index(es):