Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Machine spec for 70GB data
From 
 
Daniel Feenberg <[email protected]> 
To 
 
"[email protected]" <[email protected]> 
Subject 
 
Re: st: Machine spec for 70GB data 
Date 
 
Sat, 22 Oct 2011 08:47:59 -0400 (EDT) 
On Sat, 22 Oct 2011, Gindo Tampubolon wrote:
Dear all,
I need to process a large data file [70GB; a few millions obs] with 
Stata 12 MP8. Mainly to do cross-random effects,individuals and 
hospitals, where the outcome is length of stay [controlling for no more 
than a handful of covariates to begin with]. As an approximation, the 
outcome is treated as continuous i.e. linear mixed models.
What kind of machine spec would be needed? Any ideas, information, 
experience? Would operating system make any difference? I'm open to 
consider Windows, Linux, OS X.
Once you have the 64-bit versions the operating system and Stata Linux v 
Windows won't make much difference, but you really need to establish how 
much memory you will need. Machines that offer more than 24GB of memory 
are much more expensive than smaller machines so you can save quite a bit 
if you can limit your maximum "set memory" to 18 GB or so.
If you are able to read a subset of the data into a machine you already
have, that can give you an idea of how much memory you will need for the 
full dataset. You say "a few million observations" but unless "few" means
thousands you should be able to get by with far less than 70GB of memory. 
You don't say how many variables, or how many are float or int. If you 
have 250 ints, you can store nearly a million observations per GB. Stata 
doesn't need much more memory than that which is used for the data.
I have posted some suggestions for working with large datasets in Stata at
  http://www.nber.org/sys-admin/large-stata-datasets.html
the main point of which is that if you separate the sample selection from 
the analysis steps, it is possible to work with very large datasets in 
reasonable core sizes (if the analysis is only on a subset, of course).
There is some information on the Stata website:
  http://www.stata.com/support/faqs/win/winmemory.html
  http://www.stata.com/support/faqs/data/dataset.html
It is possible to get computers with up to 256 GB of memory for 
reasonable prices (for some definitions of reasonable, such as 
$US25,000) and that can be convinient. It probably isn't necessary, 
though.
Dan Feenberg
Many thanks,
Gindo
University of Manchester
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/