Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: memory problem where over 50% of memory are free


From   "Michael Blasnik" <[email protected]>
To   <[email protected]>
Subject   st: Re: memory problem where over 50% of memory are free
Date   Tue, 10 Apr 2007 07:10:44 -0400

First, you will likely need to allocate more memory when working with a large and narrow dataset like this.

Second, I would try a more direct approach to the calculations whenever possible due to memory and speed considerations. The -egen- command may actually create several variables during its execution, including a temp variable to hold the result until it's done, a variable to flag the sample to include in the calculations, a pseudo variable used by Stata for the sorting, and perhaps even another copy of the original variable (I haven't checked the max() code, but since it accepts expressions it may create the result of that expression). Anyway, if you don't have missing values on year2, it would be much more memory efficient (and faster to execute):

sort persnr year2
by persnr: gen int maxyear2=max(year2)

If you do have missing values on year2, it becomes a little more complicated and you will need to generate a byte variable to track those observations and issue a few extra commands:

gen byte touse=year2<,
sort persnr touse year2
by persnr touse: gen int maxyear2=max(year2) if touse
drop touse
by persnr (maxyear2): replace maxyear2=maxyear2[1]

I find that I often need only about twice the required minimum memory to work with big datasets, but if the datasets are vary narrow, like yours, I often need triple the required memory because some commands need to add the equivalent of several more variables while they are executing.

Michael Blasnik.

----- Original Message ----- From: "Stephan Brunow" <[email protected]>
To: <[email protected]>
Sent: Tuesday, April 10, 2007 5:10 AM
Subject: st: memory problem where over 50% of memory are free



Dear Statalisters,

I have a problem concerning the memory storage. There is a quiet large
dataset. If I use just 6 variables,


obs:    21,041,596
vars:             6
size:   336,665,536 (56.8% of memory free)
----------------------------------------------------------------------------
---
             storage  display     value
variable name   type   format      label      variable label
----------------------------------------------------------------------------
---
persnr          long   %12.0g
year1           int    %8.0g
month1          byte   %8.0g
year2           int    %8.0g
month2          byte   %8.0g
util           int    %8.0g
----------------------------------------------------------------------------
---

I set the memory quiet large:
<snip>
At least, over 50% of allowed memory are free. There should be enought place
to generate 2 or 3 integer variables. However, if I do the following I
recieve the error message that there is no room to add a variable due to
width. I can wheter compress the data nor drop variables since it is
compressed and I need these 6 variables.

Here is the command:

. by persnr, sort: egen int maxyear2=max(year2)

What might be the problem, what should I do?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index