Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?

 From Eric Booth To "" Subject Re: st: Is -collapse- the Stata's fastest routine to summarize data sets? Date Fri, 9 Jul 2010 14:26:15 +0000

```<>

If you want to collapse by several categorical vars with -tabout- it's not as straightforward as with -collapse-.  You can create a single variable that is an indicator of all possible combinations of the n categorical variables and then -tabout- by that combined indicator.  For example,

******************!
clear
sysuse auto
if _rc ssc install tabout

**create n categorical vars**
recode rep78 (.=0)
lab def rep78 1 "one" 2 "two" 3 "three" 4 "four" 5 "five" 0 "zero/miss", modify
lab val rep78 rep78
egen price2 = cut(price), group(4) label
drop price

//  1. collapse
ds make rep78 for price2, not
local vars `r(varlist)'
**
preserve
collapse (sum) `vars'  , by(rep78 price2 foreign)
outsheet using collapsed.csv, comma replace
restore

local vars: subinstr local vars " " " sum ", all
di "`vars'"
**
tabout rep78 price2 foreign using taboutex.csv, replace sum c(sum `vars') style(csv)  h2(THIS ISN'T WHAT YOU WANT |)

preserve
**decode your categorical vars**
foreach v in rep78 price2 foreign {
decode `v', g(`v'a)
drop `v'
rename `v'a `v'
}
**combine your categorical vars into one var**
g categories = price2 + rep78 + " - " + foreign
ta categories
**
tabout categories using taboutex.csv, append sum c(sum `vars') h2(THIS IS WHAT YOU WANT|) lines(double) style(csv)
restore
******************!

~ Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754
On Jul 8, 2010, at 6:47 PM, Tiago V. Pereira wrote:

> Many, many thanks Eric!
>
> Yes, -tabout- really seems to be much faster than -collapse-. However, I
> could not figure out how to make it work when one has n categorical
> variables, and wants to summarize continous variables taking all possible
> combinations of the n categorical variables.
>
> -collapse- does that using the by() option.
>
> Thanks again!
>
> Tiago
>
>
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```