Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Macs and 64-bit Memory Addressing


From   cnguyen@stata.com
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Macs and 64-bit Memory Addressing
Date   Mon, 03 Nov 2003 11:10:43 -0600

Michael Ingre <Michael.Ingre@ipm.ki.se> wrote:

> According to apple it is possible right now to gain a lot of speed in
> calculations with 64 bit integers and floats (like Stata does a lot) quite
> easy - if you use gcc 3.3.

Stata benefits from greater double-precision performance and we get that from
the G5 without having to do anything.  gcc 3.3 has not shown to substantially
increase double-precision performance in the tests I've performed.

> " By writing the appropriate code, you can maximize your software's use of
> the G5 processor's dual 64-bit floating-point units (FPUs), dual 64-bit
> integer units, and dual load/store units. "

Stata has always been written with greater FPU performance in mind.

> With this compiler flags:
> 
> "-mpowerpc64 
> In combination with the above flags, this flag tells the compiler to enable
> the G5's native 64-bit long long support for greatly enhanced performance
> when working with long longs."

We don't use long longs.  Even if we did, our usage of them would benefit very
little from the enhanced performance.

> -mpowerpc-gpopt 
> In combination with the above flags, this flag tells the compiler to enable
> the G5's hardware floating point square root support for greatly enhanced
> performance. 

There are many compiler flags/special functions you can set/use that produces
faster code but at the expense of accuracy.  IBM has a beta compiler with
optimizations that produces incredibly fast code at the expense of accuracy.
It has a fast math optimization flag that sped up a test script that normally
takes a little over 13 seconds on a dual G5 to around ten seconds (maybe even
less than ten).  However, the results it put out were way out of acceptable
tolerance in accuracy.  In case you're interested, that same test took over 17
seconds on a 2.4GHz P4 and less than 17 seconds on a 1.8GHz G5.

I compared a console version of Stata compiled with MetroWerks Codewarrior
and gcc 3.3 (with G5 optimizations) and Codewarrior came out ahead in many
areas and the differences were minor in others.  The IBM compiler might've
been _slightly_ faster but it's beta and they can just as easily get out of
the Macintosh compiler market as they got in.

There are also problems with the gcc 3.3 optimizations because they cause
Stata to fail some of its certifcations tests on Panther.  This does not
happen with Jaguar.  If you have to turn off the optimizations with gcc, then
it's not worth using.

Anyway, let's remember that Apple kept the G5 under wraps for a while and that
Metrowerks (who have been at the compiler game a lot longer than Apple and
saved Apple's bacon when the first PowerPC was released) will add G5 support
to their compiler eventually.  And when they do, I expect the same sort of
minimal speed gains I'm seeing now with gcc 3.3 because the sort of
optimizations they perform are not of great benefit to Stata.

-Chinh Nguyen
 cnguyen@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index