Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Machine spec for 70GB data


From   Daniel Feenberg <feenberg@nber.org>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Machine spec for 70GB data
Date   Sat, 22 Oct 2011 08:47:59 -0400 (EDT)


On Sat, 22 Oct 2011, Gindo Tampubolon wrote:

Dear all,

I need to process a large data file [70GB; a few millions obs] with Stata 12 MP8. Mainly to do cross-random effects,individuals and hospitals, where the outcome is length of stay [controlling for no more than a handful of covariates to begin with]. As an approximation, the outcome is treated as continuous i.e. linear mixed models.

What kind of machine spec would be needed? Any ideas, information, experience? Would operating system make any difference? I'm open to consider Windows, Linux, OS X.

Once you have the 64-bit versions the operating system and Stata Linux v Windows won't make much difference, but you really need to establish how much memory you will need. Machines that offer more than 24GB of memory are much more expensive than smaller machines so you can save quite a bit if you can limit your maximum "set memory" to 18 GB or so.

If you are able to read a subset of the data into a machine you already
have, that can give you an idea of how much memory you will need for the full dataset. You say "a few million observations" but unless "few" means thousands you should be able to get by with far less than 70GB of memory. You don't say how many variables, or how many are float or int. If you have 250 ints, you can store nearly a million observations per GB. Stata doesn't need much more memory than that which is used for the data.

I have posted some suggestions for working with large datasets in Stata at

  http://www.nber.org/sys-admin/large-stata-datasets.html

the main point of which is that if you separate the sample selection from the analysis steps, it is possible to work with very large datasets in reasonable core sizes (if the analysis is only on a subset, of course).

There is some information on the Stata website:

  http://www.stata.com/support/faqs/win/winmemory.html
  http://www.stata.com/support/faqs/data/dataset.html

It is possible to get computers with up to 256 GB of memory for reasonable prices (for some definitions of reasonable, such as $US25,000) and that can be convinient. It probably isn't necessary, though.

Dan Feenberg




Many thanks,
Gindo
University of Manchester
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index