Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Is -collapse- the Stata's fastest routine to summarize data sets? |

Date |
Fri, 9 Jul 2010 07:44:53 -0700 |

A quick question related to this: I note that many use the timer function to get timings. I have sometimes used rmsg (set rmsg on) which gives the timing after each command. Would this be simpler? Tony ________________________________________ From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Eric Booth [ebooth@ppri.tamu.edu] Sent: Thursday, July 08, 2010 5:36 PM To: <statalist@hsphsun2.harvard.edu> Subject: Re: st: Is -collapse- the Stata's fastest routine to summarize data sets? <> Tiago: When summarizing a large dataset, I've found the program that runs the fastest for me is -tabout- (from SSC). I don't know enough about what's going on in the tabout adofile to know why it's faster and it may not be faster for all types of summary tables, but I when I changed from -collapse-/-contract- to -tabout- in my do-file there was a huge time savings when working with a dataset of about 60 million obs. For an illustration, here's a speed comparison for creating the same summary table with these 2 packages: ******************! clear all ** | change -set mem- and -expand- below to fit your system | ** set mem 12g sysuse auto cap which tabout if _rc ssc install tabout **create a large dataset** expand 950000 desc, sh recode rep78 (.=9) **test collapse vs. tabout** // 1. collapse ds make rep78, not local vars `r(varlist)' ** timer clear 1 timer on 1 collapse (sum) `vars' , by(rep78) timer off 1 save master // 2. tabout local vars: subinstr local vars " " " sum ", all di "`vars'" ** timer clear 2 timer on 2 tabout rep78 using test.xls, replace sum c(sum `vars') timer off 2 **make sure these are creating the same summary tables** cf _all using master.dta, verbose all ** timer list ******************! /* timer list 1: 240.41 / 1 = 240.4130 2: 0.43 / 1 = 0.4340 */ 4 minutes for -collapse- versus less than a second for -tabout- summary table (using Stata 11.1 MP on Mac OS X). Good luck. ~ Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu Office: +979.845.6754 On Jul 8, 2010, at 9:02 AM, Tiago V. Pereira wrote: > Dear Statalister, > > I am eager to know any faster alternatives to -collapse-, because I have > to summarize relatively large data sets for a simulation study. -profiler- > is telling me that most of the computation burden comes from -collapse-. > Do you know (have) any faster alternative? Perhaps a plug-in? > > Thanks! > > Tiago > > * > * For searches and help try: * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Box Tidwell procedure & Wald Chi2***From:*"Cornelius Nattey" <cornelius.nattey@nioh.nhls.ac.za>

**Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?***From:*Eric Booth <ebooth@ppri.tamu.edu>

**References**:**st: Is -collapse- the Stata's fastest routine to summarize data sets?***From:*"Tiago V. Pereira" <tiago.pereira@mbe.bio.br>

**Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?***From:*Eric Booth <ebooth@ppri.tamu.edu>

- Prev by Date:
**RE: st: RE: RE: RE: one-tailed tests** - Next by Date:
**Re: st: RE: RE: RE: one-tailed tests** - Previous by thread:
**Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?** - Next by thread:
**Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?** - Index(es):