Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: frustrated by missing variables--collapase and merge


From   zhou yu <zyu@usc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: frustrated by missing variables--collapase and merge
Date   Mon, 21 Mar 2005 06:20:57 -0800

Michael, great advice.  It will save a lot of my time. I am very much grateful for this!

I have upgraded to the largest memory possible on my computer which is 1G.  I guess I need to upgrade my computer pretty soon. 


Best,

Zhou

















I've never seen variables disappear like that in Stata, but I do have a 
suggestion.  If you are using such a large dataset and need virtual memory, 
first I'd suggest buying more memory, it is cheap.  Second, I wouldn't use 
collapse, but would instead write the equivalent commands directly.  This 
approach can often save time avoiding doing things that collapse needs to do 
because it is a general tool while you only need a specific result.  For 
example if your dataset has just x1 - x5 and you want the means of x1-x4 by 
category of x5, I would :

foreach var in varlist x1 x2 x3 x4 {
bysort x5: replace `var'=sum(`var')/sum(`var'!=.)
}
bysort x5: keep if _n==_N

This approach will minimize the use of memory and should be quicker than 
using collapse, trivially for small datasets but perhaps noticeably in a 
large dataset.

Michael Blasnik
michael.blasnik@verizon.net

----- Original Message ----- 
From: "Zhou YU" <zyu@usc.edu>
To: <statalist@hsphsun2.harvard.edu>
Sent: Tuesday, March 19, 2002 12:43 AM
Subject: Re: st: frustrated by missing variables--collapase and merge
>
> When I collapse x1 x2 x3 x4 by x5, I expect to have x1, x2, x3, x4 and x5 
> in my newly created dataset. However, the outcome dataset sometimes misses 
> x1, x2, x3, x4 or x5. Sometimes, no variables were missing. Missing 
> variables seems to be a random event.  If there is a variable missing, I 
> have to repeat the procedure which is very time consuming.
>
> One possible reason might be my original dataset is quite large. I have to 
> use virtual memory and set the memory to almost 1G. It seems that the 
> problem is not significant when I collapse small dataset. I thought 
> someone might have a silver bullet to solve the problem, by changing some 
> settings.
>
> Thanks a bunch.
>
> Zhou
> 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

--------------------------------------------------------------------------------


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index