Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: server level computations


From   Stas Kolenikov <[email protected]>
To   [email protected]
Subject   Re: st: server level computations
Date   Fri, 9 Jul 2004 16:24:15 -0400 (EDT)

> > My first idea was to do something about the temporary files. I don't
> > know enthusiastic about it.) The best option of course would be to
> > have some sort of RAM drive, so that -preserve- would mean copying a
> > segment of RAM to RAM rather than to hard drive.
> Is this a class lab or a research computer?

Research.

Let me tell what I've also learnt about their future configurations. The
core of their cluster will be 5 dual 3.3MHz Dell machines. I think they
are coming with 1Gb RAM each, but I may be mistaken -- it surely won't be
less than that. One of those dual things goes to job management, so there
will be 4 left for Stata :) (and Matlab). They also have a bunch of other
UNIX stuff around the department that they are going to put together, for
a total of about 15 nodes or so. The machine with the largest RAM will
have 16Gb on it, so they might want to dedicate it to the supersize data
sets.

> Either way I would be surprised if this worked out well. In a research
> computer, the RAM disk will cut the maximum size of a dataset in half
> (RAM would have to be big enough for two copies, one in stata and one in
> /temp) which will cause problems as many really big datasets exist.

How big are yours? The largest in economics I've seen so far were about
500Mb. I am sure it's not the limit, though. May be the machine with the
largest RAM might be able to handle that.

> In a classroom lab, the /temp would need to be many times the size of
> the largest dataset, since several might be saved at once. But that
> presumes that the analyzed datasets are quite small, or preserves will
> start to fail for lack of disk space. Any time you overflow real memory
> into virtual memory or swap, runtimes increase by 3 orders of magnitude,
> so you don't want that to happen very often.

Looks like about the same as in Windows, where I also figured out the
1000-fold increase.

> In our experience "preserves" are not a major source of time use.
> Virtually all long running jobs use 100% of the CPU, and short running
> jobs don't really matter that much. A teaching environment might be
> different.

Yes, but there is nothing you can do about CPU power except for spending
your research budget for upgrades :). Of course there is an issue with
optimized compiler code, but that is certainly left up to Stata Corp., so
there is little to no room for discussion here. (I do remember from my
matrix computation course that the difference between LAPACK and LINPACK
is by a factor of three in some routines, though.)

> I can offer the following observations of Stata from our experience with
> a variety of hardware in a research environment:

Thanks. This may help setting the expectations right :).

 ---                                    Stas Kolenikov
 --       Ph.D. student in Statistics at UNC-Chapel Hill
 - http://www.komkon.org/~tacik/  -- [email protected]

* This e-mail and all attachments to it are not intended to provide any
* reasonable point of view and was transmitted to you in error. It
* should be immediately deleted by all recipients unless they really
* enjoy communicating with the author :). Other restrictions apply.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index