Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: frustrated by missing variables--collapase and merge

From   zhou yu <[email protected]>
To   [email protected]
Subject   Re: st: frustrated by missing variables--collapase and merge
Date   Mon, 21 Mar 2005 06:20:57 -0800

Michael, great advice.  It will save a lot of my time. I am very much grateful for this!

I have upgraded to the largest memory possible on my computer which is 1G.  I guess I need to upgrade my computer pretty soon. 



I've never seen variables disappear like that in Stata, but I do have a 
suggestion.  If you are using such a large dataset and need virtual memory, 
first I'd suggest buying more memory, it is cheap.  Second, I wouldn't use 
collapse, but would instead write the equivalent commands directly.  This 
approach can often save time avoiding doing things that collapse needs to do 
because it is a general tool while you only need a specific result.  For 
example if your dataset has just x1 - x5 and you want the means of x1-x4 by 
category of x5, I would :

foreach var in varlist x1 x2 x3 x4 {
bysort x5: replace `var'=sum(`var')/sum(`var'!=.)
bysort x5: keep if _n==_N

This approach will minimize the use of memory and should be quicker than 
using collapse, trivially for small datasets but perhaps noticeably in a 
large dataset.

Michael Blasnik
[email protected]

----- Original Message ----- 
From: "Zhou YU" <[email protected]>
To: <[email protected]>
Sent: Tuesday, March 19, 2002 12:43 AM
Subject: Re: st: frustrated by missing variables--collapase and merge
> When I collapse x1 x2 x3 x4 by x5, I expect to have x1, x2, x3, x4 and x5 
> in my newly created dataset. However, the outcome dataset sometimes misses 
> x1, x2, x3, x4 or x5. Sometimes, no variables were missing. Missing 
> variables seems to be a random event.  If there is a variable missing, I 
> have to repeat the procedure which is very time consuming.
> One possible reason might be my original dataset is quite large. I have to 
> use virtual memory and set the memory to almost 1G. It seems that the 
> problem is not significant when I collapse small dataset. I thought 
> someone might have a silver bullet to solve the problem, by changing some 
> settings.
> Thanks a bunch.
> Zhou
*   For searches and help try:


*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index