Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: collapsing efficiently

From   Ulrich Kohler <>
Subject   Re: st: collapsing efficiently
Date   Thu, 18 Dec 2003 09:54:01 +0100

Steven Stillman (LMPG) wrote:
> I am collapsing individual/quarter data down to yearly population counts
> for a number of variables (around 20) for various groups (this part isn't
> important).  This is a large dataset of about 20,000 obs per quarter * 64
> quarters.  My full dataset is around 330m.
> Ideally, I would do this using the command:
> collapse (sum) varlist [pw=weight], by(group year) fast
> Unfortunately, even when I drop all variables from my dataset besides the
> ones being collapse and allocate my full system memory of 500m, I get an
> error message that not enough memory is available.  I believe this occurs
> because of collapse's internal use of doubles and its creation of new temp
> variables before deleting the old ones.
> I have gotten around this using the following sequence of commands:
> [for var varlist: (forgive my use of for, old habits die hard)
> egen float temp = sum(X*weight), by(group year) \
> qui replace X = temp \
> qui drop temp] (brackets are just to indicate this is all one command)
> bys year group: keep if _n==1
> This does exactly what I need but is tediously slow.  My use of egen means
> I am storing lots of unnecessary information (ie duplicate records) that I
> have no need for

This does the same, but faster. I would be curious how large the difference in 
speed is. 

sort year group
foreach var of varlist myvars* {
	by year group: replace `var' = sum(`var'*weight)
by year group: keep if _n==_N


+49 (030) 25491-361

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index