Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is a data set with size 296,242,984 too big for stata to analyze?


From   "Sergiy Radyakin" <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Is a data set with size 296,242,984 too big for stata to analyze?
Date   Mon, 10 Nov 2008 13:40:57 -0500

Hello Mandy,

below is a pessimistic scenario that shows that you might not be able
to work with such large datasets in Stata 32. It makes extreme
assumptions to show that this can be the case, but it need not be true
in your situation.

Suppose you have 2 variables of type byte X and Y. The width of this
dataset is 2 bytes, and with the filesize you quote you have ~148 121
492obs. The actual memory consumption is 8+2 = 10bytes (in MP)  per
observation making it 1 481 214 920 bytes, or about 1.4 G. You've
managed to load this dataset into memory, so probably you have MORE
than 2 variables, or they are of wider types, or you have Stata SE.

The key piece of information that is missing is the number of
variables in your compressed dataset and the amount of memory free
after you load the dataset. Regress will attempt to create temporary
variables (at least one, to store e(sample), but may be more, the
exact amount is undocumented). If there is no space to accomodate
those temporary variables, you will not be able to work with this
dataset. Also be specific, if  by "size" you mean "size of data in
bytes" as I mean it, or "size of used memory" as Martin means it. (the
difference is the overhead, and by looking at the numbers reported by
Martin, I can be confident that he is using an MP version of Stata :)
.

Best regards, Sergiy Radyakin


On Sun, Nov 9, 2008 at 6:37 AM, Martin Weiss <martin.weiss1@gmx.de> wrote:
> As others have said, it all depends on your OS, available memory, version of
> Stata and so on. The answer to your question is: It depends... I have just
> tried to drive up the size of the nlsw88 dataset by repeatedly -expand-ing
> it
>
> **********
> sysuse nlsw88.dta, clear
> expand 3800
> d
> **********
>
> which easily took me to  "size: 298,718,000" without any complaints from
> Stata. I had -set mem 1G- beforehand. -des- returned that over 75% of the
> memory were still free...
>
>
> HTH
> Martin
> _______________________
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index