Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Best machine to build for running STATA


From   Michael Norman Mitchell <Michael.Norman.Mitchell@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Best machine to build for running STATA
Date   Mon, 22 Feb 2010 17:56:49 -0800

Dear Dana

I am glad that you found this information helpful... here is more information about your queries...

On 2010-02-22 4.50 PM, Dana Chandler wrote:
Hi Michael -

This is extremely helpful. I really appreciate the link to the report
and the estimation of how much memory a dataset will take up.

I have a few follow-up questions:
Re: Memory allocation... Christopher Baum mentions that for a 13 GB
data set, 24GB of RAM would be recommended. Are there any rules of
thumb people use in terms of how much memory a system should have to
comfortably do analysis on a dataset of a given size? Also, how much
memory should be allocated to a dataset. If you have a 250mb dataset,
is allocating 1gig overkill, can this be harmful?

As described at http://www.stata.com/support/faqs/win/pcreqs.html, Stata recommends "50% more memory than the size of your largest dataset". This is because Stata needs contiguous memory, and sometimes the operating system chops up available memory into blocks that are not necessarily contiguous. You may have seen this for yourself, where you may have 1.8 gigabytes of memory available (according to Windows), but Stata can only allocate 1.0 gigabytes. A related issue, raised by others in threads today, is the benefit of a 64 bit OS for being able to access more memory.

I think allocating more memory than you need is generally harmless... the only exception would be if you were able to allocate so much memory that Windows started to need to use virtual memory instead of real memory. In such a case, then your machine will slow down to a complete crawl as windows furiously grinds the hard drive for virtual memory.
Re: hardware... I'm not an expert on different kinds of RAM and hard
drives. Does anyone have any experience with what types of RAM (SRAM
vs. DRAM vs. ??) or hard drives (SCSI vs. SATA or ATA) might work best
with STATA. What about ways to optimize the page file or use virtual
memory?
I am not an expert in these matters either. My feeling is that hard drive speed is a rather trivial issue with regards to Stata performance since the data files are read into and processed in memory. Memory speed may be more critical. I think whatever makes memory fast for other tasks (like gaming or databases) would also make that memory fast for Stata. In other words, if you find reviews saying xyz memory is really fast for general computer applications, it is likely it could be useful for Stata. How useful, I could not say.

Re: the report... it is mentioned that they chose a "problem size"
that was relatively large to run all the simulations that measured
speed of X processors vs. 1. I may have missed it, but do they ever
mention if the ratio of the gains or the "percentage parallelization"
stays constant as the problem size grows? I frequently encounter
problem sizes larger than those stated and would like to know if the
percentage parallelizations will remain about the same.

This is a very good question. I think that would be a question for Stata tech support for them to speculate whether they expect the pattern of results to generalize as the problem size grows. My expectation is that the results would hold steady as the size of the problem grew since the parts that are made parallel are likely the parts that are most computationally intensive, and hence would form the bulk of the work for large problem sizes. My experience, for example, with "xtreg" was that even for pretty large problems, that I realized the same kinds of performance gains (or slightly higher) than shown in the report. I believe that report is the most comprehensive benchmark I have ever seen showing the gains in performance for additional processors for any software, and certainly for any statistical software.

In short, I think that you could obtain relatively small gains in performance with the fastest hard drive in the world, and relatively small gains in performance based on memory speed. My belief is that the two factors that will dominate your speed will be having sufficient memory (so you do not ever use "virtual memory") and having Stata/MP to gain the benefits of the multiple processors. Those two factors, I believe, can improve your performance by a factor of three (up to three times faster with four processors than one). By contrast, I think the gains with respect to hard drives and memory would be measured on the order of ten to twenty percent.

I hope that is helpful, and would love others to weigh in with their experience, especially differences of opinion.

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell


On Mon, Feb 22, 2010 at 4:56 PM, Michael Norman Mitchell
<Michael.Norman.Mitchell@gmail.com>  wrote:
Greetings

  Two factors come to my mind as being very important...

  1) Having sufficient memory. This has been discussed today on the
statalist, with links to how you can calculate your memory needs.
  2) Whether you will be using Stata/MP, and how many cores you want to get
(both for your Stata/MP license and physical cores). For large statistical
models, you can save considerable time running models on four cores, for
example. This link contains a detailed report showing the time savings one
gets using Stata/MP, and how much time savings you obtain for each
additional core you add for each command.

http://www.stata.com/statamp/report.pdf

  These are not the only factors, but I feel they are among the major
factors.

I hope this helps,

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell

On 2010-02-22 2.27 PM, Dana Chandler wrote:
Hi fellow Statalisters -

I was wondering if anyone has any suggestions or guidelines for what
would be the ideal type of machine to build for intensive STATA-use.

In particular, if you wanted to be able to run saturated regression
models on large (several gigabyte) data sets in STATA, what would the
ideal set up be? This is a computer that will be used exclusively for
data-intensive tasks and mostly with STATA.

The only requirement is that it has to be built on a windows x86
operating system. What type of hardware makes for the speediest STATA
experience: harddrive type, RAM type, number of processors, etc. ?

Thanks in advance,
Dana
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index