[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: collapsing efficiently

From	Ulrich Kohler <[email protected]>
To	[email protected]
Subject	Re: st: collapsing efficiently
Date	Thu, 18 Dec 2003 09:54:01 +0100

Steven Stillman (LMPG) wrote:
> I am collapsing individual/quarter data down to yearly population counts
> for a number of variables (around 20) for various groups (this part isn't
> important).  This is a large dataset of about 20,000 obs per quarter * 64
> quarters.  My full dataset is around 330m.
>
> Ideally, I would do this using the command:
> collapse (sum) varlist [pw=weight], by(group year) fast
>
> Unfortunately, even when I drop all variables from my dataset besides the
> ones being collapse and allocate my full system memory of 500m, I get an
> error message that not enough memory is available.  I believe this occurs
> because of collapse's internal use of doubles and its creation of new temp
> variables before deleting the old ones.
>
> I have gotten around this using the following sequence of commands:
>
> [for var varlist: (forgive my use of for, old habits die hard)
>
> egen float temp = sum(X*weight), by(group year) \
> qui replace X = temp \
> qui drop temp] (brackets are just to indicate this is all one command)
>
> bys year group: keep if _n==1
>
> This does exactly what I need but is tediously slow.  My use of egen means
> I am storing lots of unnecessary information (ie duplicate records) that I
> have no need for

This does the same, but faster. I would be curious how large the difference in 
speed is. 


sort year group
foreach var of varlist myvars* {
	by year group: replace `var' = sum(`var'*weight)
}
by year group: keep if _n==_N


regards
uli


-- 
[email protected]
+49 (030) 25491-361


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: collapsing efficiently
  - From: "Steven Stillman (LMPG)" <[email protected]>

Prev by Date: Re: st: collapsing efficiently
Next by Date: Re: st: sysdir set for do-files or datasets ?
Previous by thread: Re: st: collapsing efficiently
Index(es):
- Date
- Thread