[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: re: building a 'dream' stata desktop setup

From (William Gould, StataCorp LP)
Subject   Re: st: re: building a 'dream' stata desktop setup
Date   Tue, 08 Jul 2008 13:41:57 -0500

David Airey <david.airey@Vanderbilt.Edu> wrote, 

> [...] Intel has now recommended programmers prepare their code for more
> cores than currently on the market or imaginable (i.e., 100s to 1000s). What
> are we going to pay for Stata then? Clearly, Stata is charging more because
> they can and those who buy 8 core machines have money in their pockets. When
> it is the norm to have a larger number of cores, prices will not be by the
> core, or no one will buy Stata.

I suspect David is imagining that all that was required to produce Stata/MP
was recompiling Stata by specifying a compiler option and then selling the
product.  If that were the case, I would agree with David. 

That is not what we did.  Stata/MP was a major rewrite of Stata, the purpose
of which was to work directly with the multiple cores.  This involved not just
parallelizing code, but deciding where and how deeply to parallelize, and
rewriting computation algorithms to be amenable to parallelization.

Stata/MP was a major effort and it still is.  Multiple developers work full
time parallelizing more and more of Stata.

In fact, nowadays one could produce a multiprocessor product simply by
compiling single-processor code using a sophisticated compilers just released
in the last few months.  The latest Intel compiler has just such a feature,
and as a result, we may be about to see programs, including statistical
packages, that run on "all the cores".

The problem is, such automatic techniques for producing parallel software does
not work nearly as well as custom coding efforts such as those performed 
for Stata/MP.  

Here's a table:

                        -------------------- run time -------------------
                                       --  Stata/MP -    Automatic method
      Processors        Perfect        MP-A      MP-E    Alt. 1    Alt. 2
               1           1.00        1.00      1.00      1.00      1.00
               2            .50         .72       .57       .94       .87 
               4            .25         .50       .35       .90       .81
               8            .125        .42       .24       .89       .77
              40            .025        .35       .15       .87       .75
             400            .003        .33       .13       .87       .74
           4,000            .0003       .33       .13       .87       .74
      Note:  Parallelizeable regions are 100% for Perfect, 66.6% 
             MP-A, 87% for MP-E, 13% for Alt. 1, and 26% for Alt. 2.

             Numbers for Stata/MP based on actual measurement.  MP-A 
             reports results for all Stata commands.  MP-E reports 
             results for all estimation commands.

             Alt. 1 is a generous estimates of what can be achieved by 
             automatic compiler methods today.

             Alt. 2 is a generous estimate of what may be achievable by 
             automatic compiler methods in the future.

Alternatives 1 and 2 above are admittedly made up, but they have been made up
generously.  Alternative 1, for instance, is supposed to be what is achievable 
by today's compilers, yet using the current Intel compiler, we cannot achieve
such results.  The results reported in the Alternative 2 column are about
twice as good as we think are theoretically possible with automated methods.

The numbers in the Stata/MP column are overall observed averages with 
an extrapolation to 400 and 4,000 processors.

I admit I am in the process of setting up a straw man and knocking him
over.  I am setting up the straw man because I suspect the "specify the
option and recompile" model is, unconciously, the underlying assumption in
everyone's mind when first thinking about this issue.

So let's understand the implications of the table.  Stata/MP running
on two cores produces better performance than either automatic alternative
running even on 4,000 cores.  Stata/MP on four cores does even better, 
and indeed we are charging you for that. 

David is right when he states, "Stata is charging more because they can and
those who buy 8 core machines have money in their pockets".  I would say it
differently, of course.  I would say that Stata with 4 cores produces a lot
more performance than Stata with 1 or 2 cores, and so the price is justified.

In part, the price is justified because making parallel algorithms work
efficiently on more than two cores requires a surprising amount of 
extra work.  The problem is, you don't necessarily want to run on all 
of them because the setup costs could be too great.  Instead, you must 
develop a subsystem that decides problem-by-problem, based on current 
conditions, exactly how many processors should be used for each little 
piece of the calculation. 

Nonetheless, David would be absolutely correct to say to that StataCorp chose
to charge more for 4-core Stata than 2-core than costs could justify.  That's
always the case with software:  the cost of development is an up-front cost
and afterwards, prices are set to spread those costs (and profits) in ways
that seem equitable.

-- Bill
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index