Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Re: still troubles in loading a big dataset in Stata---- Help Please!!!


From   "Michael Blasnik" <michael.blasnik@verizon.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: Re: still troubles in loading a big dataset in Stata---- Help Please!!!
Date   Sat, 6 Sep 2003 11:16:31 -0400

I have to disagree with at least one point made by Ada Ma.

I frequently work with large datasets and find that you can do most data
analysis tasks with only a little (~20%) extra memory assigned to Stata than
the dataset size.  You certainly do not need to assign 1200MB to Stata to
run regressions on a 600MB dataset and it can be a very bad idea to assign
so much memory.  When the memory assigned to Stata  approaches or exceeds
the physical memory of the machine, virtual memory is used and everything
slows down dramatically.  Therefore, when loading large datasets, assigning
too much memory to Stata can create problems.

Also, I haven't seen a text file grow by 6x-8x when insheeting into Stata.

Getting back to the original question, if -set mem 726m- still runs into the
same memory shortage problem, then assign more memory until you can insheet
the file.  You could also use a text editor and try grabbing the first few
hundred or thousand lines and insheeting them separately.  Then you could
see how the data are being stored by Stata, how large the full dataset is
likely to be (multiply bytes/obs by full # obs), and maybe come up with an
approach to make it smaller when exporting it -- e.g., are there
opporunities for value labels to replace long strings?

Michael Blasnik
michael.blasnik@verizon.net


---- Original Message ----- 
From: "Ada Ma" <Pelikan_4001@hotmail.com>
To: <statalist@hsphsun2.harvard.edu>
Sent: Saturday, September 06, 2003 9:47 AM
Subject: st: Re: still troubles in loading a big dataset in Stata---- Help
Please!!!


> Shqiponja,
>
> _set memory_ is the relevant command in here, although you might not be
> choosing the right amount.
>
> I suggest that:
>
> (1) find out how big is your file before you decide on the memory size.
> (where did you get that 726 from anyway?)  If your data is about 600MB,
> you'll have to set your memory at 1200MB or more to do the most basic
> regressions and etc..  Given that you're reading in tab deliminated files,
> the file size might grow even bigger as you read it into Stata.  My
> experience is that the ratio is more like 6 to 8 times as big as your
> original text file.
>
> (2) find out how much memory you have on your computer.  Newer computers
> will tell you that it hasn't got enough memory (op. sys refuse to allocate
> more memory).  Old ones will just put up with it and pretends to be
running
> (you can wait for days before you find out!): you can set mem equals
1000MB
> even though it has only 64MB physical memory.  If you haven't got enough
> memory, you'll need a new computer. (Yay!!)
>
> There are other ways to sort this problem out but these two should get you
> going.
>
> Ada Ma
> Department of Economics
> University of Aberdeen, Scotland
> Pelikan_4001@hotmail.com
>
>


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index