Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: -expand-, -expandcl-, and -set mem-; limit to the number of obs?


From   "Eric A. Booth" <ebooth@ppri.tamu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: -expand-, -expandcl-, and -set mem-; limit to the number of obs?
Date   Sun, 11 Oct 2009 18:48:37 -0500


Misha wrote:

Sometimes I can only set the memory to 16g
(if I ask for more I get the "op. sys. refuses to provide memory"
message); sometimes I can get only 32g; and sometimes I can get 100g.
What could be the problem?

It sounds like you've got Stata set to use virtual memory (-set virtual on-) which is why you are seeing the different results from - set mem-. As Martin's link to my posting indicates, you can 'step up' your memory to help get the most out of it, but your system will still limit what is allocated based on how much of your physical RAM and your virtual memory swap/page file space (e.g., hard drive or mounted drive space) is available. So, my guess is that you get 32GB, instead of 100GB, when your machine resources are being used and aren't available for virtual memory . Check your Task Manager in Windows or your Activity Manager in Mac OS (or type "top -c " in *nix) while you are trying to open the dataset. How much physical, hardwired RAM do you have on your machine?

Misha wrote:
"Why am I asking for so much memory?", you might ask.  Well, I have a
data set that, when expanded, ought to give me about 2.63e+09 (i.e.,
nearly three billion) observations.

How large is the .dta file you are using (not in observations, but in terms of disk space)? Keep in mind that the "size" of your dataset is more than just the number of observations; so, how many variables are in the dataset? how many characters/digits are in your variables? are there labels, notes, or other characteristics stored in your .dta file? All of these will contribute to the amount of memory needed open the file in Stata (this is also why it is difficult to do a simple "back-of-the- envelope" calculation of the exact amount of memory you need). For example, look at the difference in the size (% memory free) for these two examples where the number of variables is increased by only one:

**********
set virtual on
*
clear all
set mem 1g
set obs 1000000
desc
gen i = 1
desc
*
clear all
set mem 1g
set obs 1000000
gen str244 i = "a"
gen str244 i2 = "a"
desc
**********

Compressing (-compress-)the dataset or recasting (-recast-) the variables can help if the dataset is near the memory limit, but if it is that large, you should probably consider only using the variables you need during each step of the analysis, by specifying the varlist in the -use- command, or breaking your dataset up into smaller chunks if that's possible.

Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754



On Oct 11, 2009, at 5:39 AM, Martin Weiss wrote:


<>

Look at http://www.stata.com/statalist/archive/2009-07/msg00899.html
and http://www.stata.com/support/faqs/win/winmemory.html



HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Misha Spisok
Sent: Sonntag, 11. Oktober 2009 11:51
To: statalist@hsphsun2.harvard.edu
Subject: st: -expand-, -expandcl-, and -set mem-; limit to the number of
obs?

Hello, Statalist!

I have a few questions about Stata's ability to handle billions of
observations.

On the Stata webpage, "Which Stata is right for me," it indicates that
the number of observations is unlimited for Stata versions other than
Small Stata.

The network computer I'm using has Stata 11.0 SE and claims to have
113,000MB of RAM available.  At one point I managed to set the memory
to 100g.  However, on subsequent tries (after logging out and logging
in), I get mixed results.  Sometimes I can only set the memory to 16g
(if I ask for more I get the "op. sys. refuses to provide memory"
message); sometimes I can get only 32g; and sometimes I can get 100g.
What could be the problem?

"Why am I asking for so much memory?", you might ask.  Well, I have a
data set that, when expanded, ought to give me about 2.63e+09 (i.e.,
nearly three billion) observations.  Whether I use -expand- or
-expandcl- I run into the same problem, getting the message "no room
to add more observations, etc."  I've compressed and dropped as much
as I can, but I still get this problem.  Am I asking too much of Stata
and/or "only" 113,000MB of RAM?  Is there a back-of-the-envelope way
to calculate how much RAM I would need to hold a given dataset?

Thank you for your time and attention.

Misha
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index