Allon.
Here is some code that I used to create a new dataset of 'by group' means
when I was dealing w/ a dataset too large to collapse on my computer. This
is a similar idea to what Michael suggested, but instead of using postfile
writes the results out to a new dataset. The key thing to notice here is
that because of the way the data is sorted, the replacement of earlier
observation with the group means does not interfere with future
calculations. You should be able to extend this code to create medians as
well.
Cheers,
Steve
step 1: local groups "list of variables that mark groups you want to
collapse by" in your example 'local groups "region6"'
step 2:
preserve;
qui foreach gp of local groups{;
sort `gp';
local i = 1;
levels `gp', local(grp);
foreach g of local grp {;
foreach var of varlist var1 var2 {;
su `var' [fw=weight5] if `gp'==`g', mean;
replace `var' = r(sum) in `i';
};
replace `gp' = `g' in `i';
local i = `i' + 1;
};
local j = `i' - 1;
keep in 1/`j';
keep `gp' var1 var2;
save `gp'_mean, replace;
restore, preserve;
};
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Allon Crazy
Sent: Tuesday, 27 February 2007 1:08 p.m.
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Re: How to merge individual records to groups in a
large dataset, w/o using collapse
Hi Michael,
thanks for your advice. Seperate the dataset into
smaller ones might be better,but I did try, still too
big for my 1G ram, using "collapse".
An example I can give for what I want to accomplish is
:
collapse (mean) var1 var2 (median) var3 (sum) var4
[fw=weight5], by(region6)
I am wondering whether there is any alternative ways
to do the same task but require less ram.
many thanks
--- Michael Blasnik <michael.blasnik@verizon.net>
wrote:
> If you are looking at -collapse-, then -merge- is
> the wrong term for Stata
> users (merge means joining tables in Stata).
>
> You would have better luck with answers if you
> showed us some sample command
> and better described what end result you want.
> Depending on what group
> summary statistics you want, you may be better off
> using just pieces of the
> data set and collapsing each one within a loop.
> With very large datasets, I
> find it usually makes more sense to work from first
> principles in Stata and
> avoid commands like -scollapse- and most -egen-
> commands as well. You can
> usually accomplish what you want more efficiently
> and with less memory
> overhead nad enhanced speed doing it this way.
>
> Michael Blasnik
>
> ----- Original Message -----
> From: "Allon Crazy" <allon_crazy@yahoo.com>
> To: <statalist@hsphsun2.harvard.edu>
> Sent: Monday, February 26, 2007 6:04 PM
> Subject: st: How to merge individual records to
> groups in a large dataset,
> w/o using collapse
>
>
>>I am wondering how to merge individual records to
>> groups for an extremely large dataset (20 million
>> observations), without using collapse. I tred
>> collapse, but my computer would not offer enough
>> memeory for it because the dataset is too huge. I
>> tried egen, but egen does not take sampling
> weights
>> into consideration. I am wondering whether there
> is
>> another way or other options.
>>
>> I would grealy appreciate.
>>
>
> *
> * For searches and help try:
> *
> http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
____________________________________________________________________________
________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/