[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Addressing > 2 gig of RAM
First paragraph: I do have experience with large datasets, running
behavioural simulation models, and misc other things, admitedly all for
my own sake, i.e. not published (I'm small, you see). As for promoting
an agenda, I emphasized that all the comments were my opinions. I never
said they were anything else. And no, I don't have an 'agenda'.
Yes, I know of a system that will allocate 4Gb to a single process
(actually 4Gb minus a few Mb for regular OS functions) - mine. Simple
test, write a little c program that malloc's and free's in a binary
search for the max you can allocate, if you don't believe what the OS
says. I'm using a Linux distro called Slackware, which is stable and
lean. I said that Windows handles memory poorly, it's not my fault if I
I agree that Stata is fine for most applications, but there really are a
lot of applications for which loops are unavoidable, and if you're doing
really big things, which was what I was talking about, and there aren't
automated (compiled) ways to do things then it's time to look to
alternatives. 64-bit _computing_ was the context of the email, not
running a few OLS's or bundled nonlinear searches.
On the chip, absolutely 64 bit helps with speed, having the lovely big
register. I was replying to an email about using 64-bit system so to get
the RAM, though. Anyway, there is justification for a 64-bit CPU using
less than 4Gb RAM on computational grounds. My next personal system will
have <4Gb RAM and a 64-bit chip.
On whether to buy a really expensive system vs. use a cheaper but really
custom system, that obviously depends on your views about money vs.
design. In my view, even buying wholesale, 6Gb of DDR is not cheap. But
that's clearly an issue for the user. You obviously have more money than
little student me, who must rely on building efficient software.
I guess this falls into the old issue in computational mathematics, for
which the conclusion is that it's better to spend some time thinking of
the best way to approach a big problem than to forge ahead bluntly. It
depends on the problem domain, though. Overkill voids that in a lot of
I stand by the comment on Windows 64bit computing being stupid. Again, I
emphasize _computing_. I'm talking about big tasks, where interpreted,
unoptimized code is a prohibitive cost. And remember that RAM issue...
You should note this about the referenced Gauss performance comparisons:
The best scores are obtained under MS's most recent flavour of OS, XP
(also no service packs, all of which drop efficiency in a big way),
whereas the versions of Linux used are obsolete and have been for years
- and they're Linux distributions that are traditionally geared towards
new users because they have many user-friendly add-ons, i.e. not the
ones for doing big computational tasks on. My own experience with RH8.0,
which ranks 4th overall (highest of the Linuxes), is that it's slow and
clunky. I don't think the comparisons are very meaningful. Also, none of
the tests are for 64-bit.
Indeed I don't find too many online performance reviews meaningful.
Linux reviews tend to be Linux-biased, Windows reviews tend to be either
Windows-biased or Linux-biased. And comparisons between Linux and
Windows are usually out-of-the-box, but not too many people who wish to
do computing are going to leave their Linux distro alone, free of
optimization. I go by my own experience when choosing OS and methods of
implementing problems. As I stated at the outset of my previous email,
these are my --opinions--.
A Linux/Unix/BSD setup for doing really large-scale number crunching
will be optimized (something on which there is no comment in the review)
for the task. For example have a look at ATLAS for the GSL. You'd be
surprised how intuitive the libraries are set out. Linux kernels are
also more mature with threading.
It is true I didn't give any evidence of my own to back up the claims. I
certainly cannot comment on 64-bit Stata, as I said. I'll give you a
couple of personal examples of why I have that opinion about 64bit Windows:
If I'm running a big task then I want to have an OS that's stable and
that doesn't need to be rebooted. I actually just finished running a job
estimating a whole pile of functions over Australia's digitized census
collection district boundary definitions that took 6 weeks continuous
computing on a 64-bit Solaris 9 quad-processor system (my department's)
- note I only used single thread, i.e. one processor at any one time. I
would never try that on Windows, for example in that 6 weeks somebody
may wish to install some new software, requiring a reboot... and
although XP and Win2K are pretty stable, they still aren't that great.
For example, the box I'm writing this email on hasn't been turned
off/crashed/restarted since Canberra last had a blackout, which I think
was about 10 weeks ago.
Second, and I think this is quite a poignant comparison, though I have
no 64-bit experience with it, is using WINE. WINE, "Wine Is Not an
Emulator" is a Windows API that works under Unix-like OS's. I run a
dual-boot system (Win2K + Slackware) and have rarely touched the Windows
side in years. When I want to run Windows software, I use WINE. It is
interesting to compare the speeds, though, of tasks under Windows native
and a pretend Windows environment. I have never experienced a task on
WINE _ever_ running slower than the same task on Windows native, and my
experience is it's usually noticably faster (and it's not my biased
perception in action, before you accuse me).
I actually don't have my own licence for Stata (use the licence at the
research school I work for), but it would be interesting to compare the
performance of 32-bit Stata on the same system under an optimized Linux
under WINE, and a recent Windows. My money would be on Linux, and you'd
have the added bonus of being able to use all your RAM.
Second-final, to 64-bit Windows in general. For the present I can't
afford and don't intend ever to touch a 64-bit Windows. I don't think
I'm preaching here, it's just that _in my experience_ Linux is more
efficient, more optimizable, and more reliable than Windows. For the
last couple of years, my own interests have exclusively involved things
requiring heavy computation. There are also a great swathe of
methodological reasons for using Unix/Linux/BSD, due to the way they're
put together, and the philosophy behind them. Software construction is
far more straight-forward on a Unix-like OS, and the stability and
transparency make everything easier. I don't think a user can appreciate
that, unfortunately, unless by experience.
I should say, Linux requires a significant outlay. All this stuff is
great if you're somebody like me who enjoys spending masses of time
wrangling with the inards of an OS. I've been talking about heavy,
large-scale number-crunching. OLS isn't number crunching, except in a
19th century sense, and neither is calibrating most nonlinear models.
As for my Athlon plug, I should have said that this is actually more
applicable to Linux. Linus Torvalds announced when Athlon 64_86 was
released that the kernel team would be designing things with emphasis on
Athlon 64_86 systems. It's also interesting to note that Intel adopted
the Athlon 64_86 architecture in pentium IV, although they renamed a few
things to not be so obvious.
They did this because their 64-bit line wasn't competing well with
Athlon, and since Linux was doing mature 64-bit computing well before
Microsoft even delayed its first release of 64-bit Windows. Indeed,
Microsoft's first 64-bit Windows was released properly only a month ago,
where 64-bit Linux has been around since 1994 (linux port to the alpha
That maturing time is significant. The 64-bit community takes Linux as
the standard, the most basic reason of which is that it's
further-developed. That market, though, were for a long time high-end
users like geeks, airplane manufacturers and meteorologists. It all
comes back to getting the system just so, which is completely different
to the out-of-the-box comparisons I was talking about. You can't really
customize Windows - you can't even get away from the GUI.
Related, for anybody interested, a 128-bit Linux kernel is in
development at the moment :) That's a whole train-track of RAM, very
Anyway, I do use Stata, and I like it too. I think it's a great way of
doing what can be done on it. However the world is bigger than that. And
it is never stupid to consider a broader set of approaches to a problem.
That's all I've been saying here - my opinions and why they exist.
Dismiss them if you like, but they have served me, along with millions
of other computer nerds, well.
Michael Blasnik wrote:
It sounds like you don't have any real experience in working with
large datasets (in Stata or otherwise) and have no actual measured
results of different computing platforms in terms of speed, but
instead want to promote some agenda. It may be "rare" for you to work
with very large datasets, but some people do it on a regular basis,
especially those that work with large scale admistrative databases
(e.g., national health programs or, in my case, utility company
databases which I admit only tend to run up to about 1GB).
You are incorrect that you only need 64bit computing if you are going
to use more 4GB of data-- do you know of a 32bit OS/hardware platform
that can allocate 4GB of ram to a process? or even 3GB? There have
been Statalist postings showing about 2GB data area max for Mac OS
10.3 on a G5 and other posts indicating that about 2GB will be the max
for most 32bit OSs. Maybe 32bit Linux can allocate more, but I
haven't heard that. 64bit also provides potential speed improvements,
not just max RAM allocation (see below).
I don't think it makes much sense to start writing your own data
analysis system in C if you already have a developed and tested tool
(like Stata) that does what you want. It is far easier (and cheaper
by an order of magnitude or more) to buy a 64bit platform, load it
with RAM and use Stata than it is to write your own code from
scratch, especially if you want to do many things. Perhaps if you
have one big repeating analysis it may make sense to write some code,
but Stata is actually quite fast even with very large datasets,
especially if you know what you're doing to optimize your Stata code.
While Windows has many shortcomings, your comment that 64bit computing
under Windows is "stupid" does not appear to be backed up by any
actual tests or comparisons. You say that Windows is "inexcusably
slow" but all of the comparisons I've seen posted to Statalist and
elsewhere seem to indicate that, for massive data analysis, the speed
of execution is comparable across equal-bit operating systems.
Comparisons of GAUSS speed on different platforms/OSs at
doesn't seem to show Windows lagging Linux, results are mixed across
tests. I'd be interested in seeing any actual data you have showing
Win64 being substantially slower at computational tasks than another
OS since I am considering moving to 64bit soon.
There was a Statalist posting that showed moving from 32bit WinXP to
64bit Linux (both with AMD 64bit processors, but different models)
led to 29%-34% speed improvements in calculating some complex gllamm
models. But this difference is between 32bit and 64bit. see:
----- Original Message ----- From: "James Muller"
Sent: Monday, June 20, 2005 8:31 AM
Subject: Re: st: Addressing > 2 gig of RAM
At the outset, I don't have experience with 64-bit Stata at all, but
I can give you some of my _opinions_ on 64-bit computing:
First, you don't need 64 bit if you're going to use no more than 4Gb.
32 bit is 2^32=4Gb. You only actually need 64 bit if you're actually
going to need more than 4Gb.
Related, second, are you actually going to need to have >4Gb of data
in memory all at the same time? That is a massive amount of data -
Stata's most detailed data type is double, which is 8 bytes. That
means that you would have to _need_ at some moment in time to be
using 2^26=67108864 double objects, which is rare. There is a strong
argument then to store your data in a dedicated database, pulling out
only the data you need when you need it, and dropping it from memory
when it's not needed. And if you did that then Stata is not the best
way to go - again, it's worth looking at c or fortran. There are good
open source libraries for doing lots of fun things in those
languages, and the methods usually aren't too hard to implement
oneself anyway. In my opinion worth it for such large apps.
Third, if you're actually doing 64 bit computing then you're after
some pretty hardcore efficiency. It is worth at least considering
more scientific methods of computing, for example writing custom
programs for your application in c or fortran. Like I say, I don't
have any experience with 64-bit Stata, but I'd suspect that not too
many people would keep using Stata for such large applications.
Fourth, 64-bit computing under Windows is just plain stupid. Sorry
for anybody who disagrees, but Windows is an operating system that is
inexcusably slow and memory-hungry. Again, 64-bit computing is just
that - computing. You're after efficiency so that you're not waiting
the next 5 months for the task to end, and you certainly want
something stable for time-consuming tasks. Learning to use a proper
OS (and using very efficient software in general) is worth it if
you're getting into the heavy stuff. Additionally, as is pointed out
in a post just before this, Stata needs a contiguous block of memory.
Windows does not handle memory well, i.e. you'll not have the whole
set of RAM available. A shame after investing in it, not to be able
to use it...
Fifth, as far as I'm aware (and I may well have my wires crossed
here), current Mac OS's use a variation the Linux kernel, and do so
because they are after efficiency and stability. While the hardware
of G4/similar is excellent, I'd expect Linux to run better than MacOS
on a equivalent hardware.
Sixth, and this definitely depends on the scale of your project, if
you're doing stuff that is slow (i.e. trillions and trillions of
calculations) then it's worth looking into parallel processing. This,
however, steps into the realm of employing a programmer or spending
lots of time studying.
Seventh, if you end up going with 64-bit linux, make sure you have an
efficient system. 'Linux' has a big reputation as being fast and
efficient, but many out-of-the-box distributions pile a bucketload of
features that aren't necessary for a lot of situations. They all eat
up resources and you end up losing a lot of the advantages. Thus, if
you go with something like Fedora Core 64, spend the time giving your
system a good haircut - and look carefully at performance reviews and
comparisons. Also, ask the guys at Stata about whether their 64-bit
Linux Stata will work on the BSDs. BSD is quite nice and should not
Eigth, if you go with 64-bit and choose a PC for your platform, be
sure to look into Athlon 64bit CPU.
Overall, my opinion is that if somebody's going to spend the cash to
purchase more than 4Gb of RAM and a good 64-bit processor (or
processors) then they should spend the time getting their system to
do justice to that investment. That means really looking into getting
things optimized, which means looking seriously at alternative ways
of approaching the problem.
Righto, my sixteen cents there. Hope it's useful.
* For searches and help try:
* For searches and help try: