Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: collapsing efficiently

From   "Steven Stillman (LMPG)" <>
To   "'statalist'" <>
Subject   st: collapsing efficiently
Date   Thu, 18 Dec 2003 19:23:32 +1300

I am collapsing individual/quarter data down to yearly population counts for
a number of variables (around 20) for various groups (this part isn't
important).  This is a large dataset of about 20,000 obs per quarter * 64
quarters.  My full dataset is around 330m.

Ideally, I would do this using the command:
collapse (sum) varlist [pw=weight], by(group year) fast

Unfortunately, even when I drop all variables from my dataset besides the
ones being collapse and allocate my full system memory of 500m, I get an
error message that not enough memory is available.  I believe this occurs
because of collapse's internal use of doubles and its creation of new temp
variables before deleting the old ones.

I have gotten around this using the following sequence of commands:

[for var varlist: (forgive my use of for, old habits die hard) 

egen float temp = sum(X*weight), by(group year) \ 
qui replace X = temp \ 
qui drop temp] (brackets are just to indicate this is all one command)

bys year group: keep if _n==1

This does exactly what I need but is tediously slow.  My use of egen means I
am storing lots of unnecessary information (ie duplicate records) that I
have no need for.

I am pretty sure that this same idea can be used by looping over groups,
calculating the sum, and storing this in only one observation per group.  I
haven't been able to figure out exactly how to do this myself and am hoping
someone else will quickly see the light here.  One problem is that summarize
cannot be used to calculate the sum because it doesn't take non-integer

thanks for any help,

Steven Stillman - Senior Research Economist
Labour Market Policy Group - Department of Labour
PO Box 3705 - Wellington, New Zealand
Tel: (64)4-915-4076 - Fax: (64)4-915-4040

The information contained in this document is intended only for the
 addressee and is not necessarily the views nor the official 
communication of the Department of Labour.  All final/official papers 
which are sent from the Department will be sent by non-electronic
means, on appropriate letterhead, signed by authorised personnel.
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index