It sounds like you don't have any real experience in working with large 
datasets (in Stata or otherwise) and have no actual measured results of 
different computing platforms in terms of speed, but instead want to promote 
some agenda.  It may be "rare" for you to work with very large datasets, but 
some people do it on a regular basis, especially those that work with large 
scale admistrative databases (e.g., national health programs or, in my case, 
utility company databases which I admit only tend to run up to about 1GB).
You are incorrect that you only need 64bit computing if you are going to use 
more 4GB of data-- do you know of a 32bit OS/hardware platform that can 
allocate 4GB of ram to a process? or even 3GB?  There have been Statalist 
postings showing about 2GB data area max for Mac  OS 10.3 on a G5 and other 
posts indicating that about 2GB will be the max for most 32bit OSs.  Maybe 
32bit Linux can allocate more, but I haven't heard that.  64bit also 
provides potential speed improvements, not just max RAM allocation (see 
below).
I don't think it makes much sense to start writing your own data analysis 
system in C if you already have a developed and tested tool (like Stata) 
that does what you want.  It is far easier (and cheaper by an order of 
magnitude or more) to buy a 64bit platform, load it with RAM  and use Stata 
than it is to write your own code from scratch, especially if you want to do 
many things.  Perhaps if you have one big repeating analysis it may make 
sense to write some code, but Stata is actually quite fast even with very 
large datasets, especially if you know what you're doing to optimize your 
Stata code.
While Windows has many shortcomings, your comment that 64bit computing under 
Windows is "stupid" does not appear to be backed up by any actual tests or 
comparisons.  You say that Windows is "inexcusably slow" but all of the 
comparisons I've seen posted to Statalist and elsewhere seem to indicate 
that, for massive data analysis, the speed of execution is comparable across 
equal-bit operating systems.  Comparisons of GAUSS speed on different 
platforms/OSs at
http://www.scientificweb.com/testreport/gaussbench2.html
doesn't seem to show Windows lagging Linux, results are mixed across tests. 
I'd be interested in seeing any actual data you have showing Win64 being 
substantially slower at computational tasks than another OS since I am 
considering moving to 64bit soon.
There was a Statalist posting that showed moving from 32bit WinXP to 64bit 
Linux (both with  AMD 64bit processors, but different models) led to 29%-34% 
speed improvements in calculating some complex gllamm models.  But this 
difference is between 32bit and 64bit. see:
http://www.stata.com/statalist/archive/2004-04/msg00620.html
Michael Blasnik
[email protected]
----- Original Message ----- 
From: "James Muller" <[email protected]>
To: <[email protected]>
Sent: Monday, June 20, 2005 8:31 AM
Subject: Re: st: Addressing > 2 gig of RAM
At the outset, I don't have experience with 64-bit Stata at all, but I can 
give you some of my _opinions_ on 64-bit computing:
First, you don't need 64 bit if you're going to use no more than 4Gb. 32 
bit is 2^32=4Gb. You only actually need 64 bit if you're actually going to 
need more than 4Gb.
Related, second, are you actually going to need to have >4Gb of data in 
memory all at the same time? That is a massive amount of data - Stata's 
most detailed data type is double, which is 8 bytes. That means that you 
would have to _need_ at some moment in time to be using 2^26=67108864 
double objects, which is rare. There is a strong argument then to store 
your data in a dedicated database, pulling out only the data you need when 
you need it, and dropping it from memory when it's not needed. And if you 
did that then Stata is not the best way to go - again, it's worth looking 
at c or fortran. There are good open source libraries for doing lots of 
fun things in those languages, and the methods usually aren't too hard to 
implement oneself anyway. In my opinion worth it for such large apps.
Third, if you're actually doing 64 bit computing then you're after some 
pretty hardcore efficiency. It is worth at least considering more 
scientific methods of computing, for example writing custom programs for 
your application in c or fortran. Like I say, I don't have any experience 
with 64-bit Stata, but I'd suspect that not too many people would keep 
using Stata for such large applications.
Fourth, 64-bit computing under Windows is just plain stupid. Sorry for 
anybody who disagrees, but Windows is an operating system that is 
inexcusably slow and memory-hungry. Again, 64-bit computing is just that - 
computing. You're after efficiency so that you're not waiting the next 5 
months for the task to end, and you certainly want something stable for 
time-consuming tasks. Learning to use a proper OS (and using very 
efficient software in general) is worth it if you're getting into the 
heavy stuff. Additionally, as is pointed out in a post just before this, 
Stata needs a contiguous block of memory. Windows does not handle memory 
well, i.e. you'll not have the whole set of RAM available. A shame after 
investing in it, not to be able to use it...
Fifth, as far as I'm aware (and I may well have my wires crossed here), 
current Mac OS's use a variation the Linux kernel, and do so because they 
are after efficiency and stability. While the hardware of G4/similar is 
excellent, I'd expect Linux to run better than MacOS on a equivalent 
hardware.
Sixth, and this definitely depends on the scale of your project, if you're 
doing stuff that is slow (i.e. trillions and trillions of calculations) 
then it's worth looking into parallel processing. This, however, steps 
into the realm of employing a programmer or spending lots of time 
studying.
Seventh, if you end up going with 64-bit linux, make sure you have an 
efficient system. 'Linux' has a big reputation as being fast and 
efficient, but many out-of-the-box distributions pile a bucketload of 
features that aren't necessary for a lot of situations. They all eat up 
resources and you end up losing a lot of the advantages. Thus, if you go 
with something like Fedora Core 64, spend the time giving your system a 
good haircut - and look carefully at performance reviews and comparisons. 
Also, ask the guys at Stata about whether their 64-bit Linux Stata will 
work on the BSDs. BSD is quite nice and should not be dismissed.
Eigth, if you go with 64-bit and choose a PC for your platform, be sure to 
look into Athlon 64bit CPU.
Overall, my opinion is that if somebody's going to spend the cash to 
purchase more than 4Gb of RAM and a good 64-bit processor (or processors) 
then they should spend the time getting their system to do justice to that 
investment. That means really looking into getting things optimized, which 
means looking seriously at alternative ways of approaching the problem.
Righto, my sixteen cents there. Hope it's useful.
James
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/