[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: re: building a 'dream' stata desktop setup

From   David Airey <david.airey@Vanderbilt.Edu>
Subject   Re: st: re: building a 'dream' stata desktop setup
Date   Tue, 8 Jul 2008 14:38:55 -0500


Thank you for the very interesting edification. I did not realize the difference in performance between the recently released software and roadmap of Intel and what it really means to take advantage of multiple cores efficiently. Just for the record, I happily paid for Stata/MP for 2 cores, and I will have the same attitude if more cores land in front of me. One more question: will you be taking advantage of graphics processors? I've read this is another source to increase computations in most computers these days.


On Jul 8, 2008, at 1:41 PM, William Gould, StataCorp LP wrote:

David Airey <david.airey@Vanderbilt.Edu> wrote,

[...] Intel has now recommended programmers prepare their code for more
cores than currently on the market or imaginable (i.e., 100s to 1000s). What
are we going to pay for Stata then? Clearly, Stata is charging more because
they can and those who buy 8 core machines have money in their pockets. When
it is the norm to have a larger number of cores, prices will not be by the
core, or no one will buy Stata.
I suspect David is imagining that all that was required to produce Stata/MP
was recompiling Stata by specifying a compiler option and then selling the
product. If that were the case, I would agree with David.

That is not what we did. Stata/MP was a major rewrite of Stata, the purpose
of which was to work directly with the multiple cores. This involved not just
parallelizing code, but deciding where and how deeply to parallelize, and
rewriting computation algorithms to be amenable to parallelization.

Stata/MP was a major effort and it still is. Multiple developers work full
time parallelizing more and more of Stata.

In fact, nowadays one could produce a multiprocessor product simply by
compiling single-processor code using a sophisticated compilers just released
in the last few months. The latest Intel compiler has just such a feature,
and as a result, we may be about to see programs, including statistical
packages, that run on "all the cores".

The problem is, such automatic techniques for producing parallel software does
not work nearly as well as custom coding efforts such as those performed
for Stata/MP.

Here's a table:

-------------------- run time -------------------
-- Stata/MP - Automatic method
Processors Perfect MP-A MP-E Alt. 1 Alt. 2
1 1.00 1.00 1.00 1.00 1.00
2 .50 .72 .57 . 94 .87
4 .25 .50 .35 . 90 .81
8 .125 .42 .24 . 89 .77
40 .025 .35 .15 . 87 .75
400 .003 .33 .13 . 87 .74
4,000 .0003 .33 .13 . 87 .74
Note: Parallelizeable regions are 100% for Perfect, 66.6%
MP-A, 87% for MP-E, 13% for Alt. 1, and 26% for Alt. 2.

Numbers for Stata/MP based on actual measurement. MP-A
reports results for all Stata commands. MP-E reports
results for all estimation commands.

Alt. 1 is a generous estimates of what can be achieved by
automatic compiler methods today.

Alt. 2 is a generous estimate of what may be achievable by
automatic compiler methods in the future.

Alternatives 1 and 2 above are admittedly made up, but they have been made up
generously. Alternative 1, for instance, is supposed to be what is achievable
by today's compilers, yet using the current Intel compiler, we cannot achieve
such results. The results reported in the Alternative 2 column are about
twice as good as we think are theoretically possible with automated methods.

The numbers in the Stata/MP column are overall observed averages with
an extrapolation to 400 and 4,000 processors.

I admit I am in the process of setting up a straw man and knocking him
over. I am setting up the straw man because I suspect the "specify the
option and recompile" model is, unconciously, the underlying assumption in
everyone's mind when first thinking about this issue.

So let's understand the implications of the table. Stata/MP running
on two cores produces better performance than either automatic alternative
running even on 4,000 cores. Stata/MP on four cores does even better,
and indeed we are charging you for that.

David is right when he states, "Stata is charging more because they can and
those who buy 8 core machines have money in their pockets". I would say it
differently, of course. I would say that Stata with 4 cores produces a lot
more performance than Stata with 1 or 2 cores, and so the price is justified.

In part, the price is justified because making parallel algorithms work
efficiently on more than two cores requires a surprising amount of
extra work. The problem is, you don't necessarily want to run on all
of them because the setup costs could be too great. Instead, you must
develop a subsystem that decides problem-by-problem, based on current
conditions, exactly how many processors should be used for each little
piece of the calculation.

Nonetheless, David would be absolutely correct to say to that StataCorp chose
to charge more for 4-core Stata than 2-core than costs could justify. That's
always the case with software: the cost of development is an up- front cost
and afterwards, prices are set to spread those costs (and profits) in ways
that seem equitable.

-- Bill
* For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index