Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?


From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?
Date   Fri, 9 Jul 2010 14:26:15 +0000

<>

If you want to collapse by several categorical vars with -tabout- it's not as straightforward as with -collapse-.  You can create a single variable that is an indicator of all possible combinations of the n categorical variables and then -tabout- by that combined indicator.  For example, 

******************!
clear
sysuse auto
	cap which tabout
	if _rc ssc install tabout

**create n categorical vars**
recode rep78 (.=0)
lab def rep78 1 "one" 2 "two" 3 "three" 4 "four" 5 "five" 0 "zero/miss", modify
lab val rep78 rep78
egen price2 = cut(price), group(4) label
drop price


//  1. collapse
ds make rep78 for price2, not
local vars `r(varlist)'
**
preserve
collapse (sum) `vars'  , by(rep78 price2 foreign)
outsheet using collapsed.csv, comma replace
restore

//  2.  tabout
local vars: subinstr local vars " " " sum ", all
di "`vars'"
**
tabout rep78 price2 foreign using taboutex.csv, replace sum c(sum `vars') style(csv)  h2(THIS ISN'T WHAT YOU WANT |)


preserve
**decode your categorical vars**
foreach v in rep78 price2 foreign {
	decode `v', g(`v'a)
	drop `v'
	rename `v'a `v'
	}
**combine your categorical vars into one var**
g categories = price2 + rep78 + " - " + foreign
ta categories
**
tabout categories using taboutex.csv, append sum c(sum `vars') h2(THIS IS WHAT YOU WANT|) lines(double) style(csv)
restore
******************!

~ Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
On Jul 8, 2010, at 6:47 PM, Tiago V. Pereira wrote:

> Many, many thanks Eric!
> 
> Yes, -tabout- really seems to be much faster than -collapse-. However, I
> could not figure out how to make it work when one has n categorical
> variables, and wants to summarize continous variables taking all possible
> combinations of the n categorical variables.
> 
> -collapse- does that using the by() option.
> 
> Thanks again!
> 
> Tiago
> 
> 
> 
> 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index