Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Machine spec for 70GB data

From	Daniel Feenberg <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Machine spec for 70GB data
Date	Sat, 22 Oct 2011 08:47:59 -0400 (EDT)


On Sat, 22 Oct 2011, Gindo Tampubolon wrote:

Dear all,
I need to process a large data file [70GB; a few millions obs] withStata 12 MP8. Mainly to do cross-random effects,individuals andhospitals, where the outcome is length of stay [controlling for no morethan a handful of covariates to begin with]. As an approximation, theoutcome is treated as continuous i.e. linear mixed models.
What kind of machine spec would be needed? Any ideas, information,experience? Would operating system make any difference? I'm open toconsider Windows, Linux, OS X.

Once you have the 64-bit versions the operating system and Stata Linux vWindows won't make much difference, but you really need to establish howmuch memory you will need. Machines that offer more than 24GB of memoryare much more expensive than smaller machines so you can save quite a bitif you can limit your maximum "set memory" to 18 GB or so.


If you are able to read a subset of the data into a machine you already

have, that can give you an idea of how much memory you will need for thefull dataset. You say "a few million observations" but unless "few" meansthousands you should be able to get by with far less than 70GB of memory.You don't say how many variables, or how many are float or int. If youhave 250 ints, you can store nearly a million observations per GB. Statadoesn't need much more memory than that which is used for the data.


I have posted some suggestions for working with large datasets in Stata at

  http://www.nber.org/sys-admin/large-stata-datasets.html

the main point of which is that if you separate the sample selection fromthe analysis steps, it is possible to work with very large datasets inreasonable core sizes (if the analysis is only on a subset, of course).


There is some information on the Stata website:

  http://www.stata.com/support/faqs/win/winmemory.html
  http://www.stata.com/support/faqs/data/dataset.html

It is possible to get computers with up to 256 GB of memory forreasonable prices (for some definitions of reasonable, such as$US25,000) and that can be convinient. It probably isn't necessary,though.


Dan Feenberg


Many thanks,
Gindo
University of Manchester
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Machine spec for 70GB data
  - From: Gindo Tampubolon <[email protected]>

Prev by Date: Re: st: Machine spec for 70GB data
Next by Date: Re: st: Machine spec for 70GB data
Previous by thread: Re: st: Machine spec for 70GB data
Next by thread: Re: st: Machine spec for 70GB data
Index(es):
- Date
- Thread