Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?

From   Eric Booth <>
To   "<>" <>
Subject   Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?
Date   Fri, 9 Jul 2010 14:26:15 +0000


If you want to collapse by several categorical vars with -tabout- it's not as straightforward as with -collapse-.  You can create a single variable that is an indicator of all possible combinations of the n categorical variables and then -tabout- by that combined indicator.  For example, 

sysuse auto
	cap which tabout
	if _rc ssc install tabout

**create n categorical vars**
recode rep78 (.=0)
lab def rep78 1 "one" 2 "two" 3 "three" 4 "four" 5 "five" 0 "zero/miss", modify
lab val rep78 rep78
egen price2 = cut(price), group(4) label
drop price

//  1. collapse
ds make rep78 for price2, not
local vars `r(varlist)'
collapse (sum) `vars'  , by(rep78 price2 foreign)
outsheet using collapsed.csv, comma replace

//  2.  tabout
local vars: subinstr local vars " " " sum ", all
di "`vars'"
tabout rep78 price2 foreign using taboutex.csv, replace sum c(sum `vars') style(csv)  h2(THIS ISN'T WHAT YOU WANT |)

**decode your categorical vars**
foreach v in rep78 price2 foreign {
	decode `v', g(`v'a)
	drop `v'
	rename `v'a `v'
**combine your categorical vars into one var**
g categories = price2 + rep78 + " - " + foreign
ta categories
tabout categories using taboutex.csv, append sum c(sum `vars') h2(THIS IS WHAT YOU WANT|) lines(double) style(csv)

~ Eric

Eric A. Booth
Public Policy Research Institute
Texas A&M University
Office: +979.845.6754
On Jul 8, 2010, at 6:47 PM, Tiago V. Pereira wrote:

> Many, many thanks Eric!
> Yes, -tabout- really seems to be much faster than -collapse-. However, I
> could not figure out how to make it work when one has n categorical
> variables, and wants to summarize continous variables taking all possible
> combinations of the n categorical variables.
> -collapse- does that using the by() option.
> Thanks again!
> Tiago

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index