[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: on 32/64-bit OS, CPU and Stata

From	[email protected] (William Gould, Stata)
To	[email protected]
Subject	Re: st: on 32/64-bit OS, CPU and Stata
Date	Fri, 29 Jul 2005 09:11:32 -0500
There have recently been a number of postings about the speed of 32-bit 
Stata and 64-bit Stata, summarized by Alan Riley's <[email protected]> 
summary

                  32-bit     64-bit
       -----------------------------------------
       sort        2.00       1.64   seconds
       regress      .88        .80
       -----------------------------------------
       timings from 1.4 GHz AMD Opteron 240 
       running 64-bit Windows.

Some are probably wondering why the numbers in the second column are 
not 1.00 and .44.  After all 64 is twice 32, so shouldn't the run times 
be half?  Or at least approximately half?

We all know how to interpret clock-cycle speeds.  A 1.4 GHz computer 
runs twice as fast as as a .7 GHz computer.  Why don't bit widths work 
this way?


Computer width
--------------

Micro processors have grown from being 4 bit, to 8, to 16, to 32, and now, 
to 64.  What does this mean?

One way to picture a computer is as a machine with gears and a crank 
sticking out of the side:

                                             +--==     <- crank
                  +---------------------+    |
                  |  ||   ||   ||   ||  |    |
      shaft ->    +---------------------+----+  
                  |  ||   ||   ||   ||  |
                  +---------------------+
               
                         ||
                    The  --  is one gear on a shaft.  
                         ||

The picture above is an illustration of a 4-bit computer.

The number of gears corresponds to the width of the computer.  The rate at
which you turn the crank corresponds to the clock speed.  (On a 1.4 GHz
computer, you turn the crank 1.4 billion times per second.)

On this 4-bit computer, every turn of the crank does something to 4 bits.
If we doubled the number of gears -- made an 8-bit computer -- every 
turn of the crank would do something to 8 bits.


Completing the picture
----------------------

Our picture is not yet complete because we must add memory, so here's 
the completed picture

        <--- computer, which slides left and right ---->

                                         +--==    
              +---------------------+    |
              |  ||   ||   ||   ||  |    |
              +---------------------+----+                      (cpu)
                 ||   ||   ||   ||
  =============================================================
 ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
    --------------------------------------------------------  (memory)
 ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
 ------------------------------------------------------------------

In this picture, The computer slides left and right on a rail (illustrated
by ====).  Below the rail is memory, another set of gears, to which the 
gears on the computer mesh.  


Putting the computer to work
----------------------------

Let's say we want to do a calculation on a 4-bit number.

To do that, we line up the cpu with appropriate position in memory -- 
the position that contains the 4-bit number on which we want to operate -- 
and we turn the crank.


Working with number longer than 4 bits
--------------------------------------

Let's say we want to do a calculation on an 8-bit number.  Using our 4-bit
computer, the following is a slight oversimplification, but not much:  We line
up our computer on the first 4 bits of the 8-bit number, turn the crank, the
shift our computer right 4 bits, and turn the crank again.

This leads to the following general rule:

     R1.  If we have a k-bit wide computer and performing an operation
          on a k-bit number requires n turns of the crank, then 
          performing the operation on a m>k bit number requires 
          (m/k)*n turns of the crank, plus m/k realignments.  

          R1 applies only when m/k is an integer >= 1.


Working with numbers shorter than 4 bits
----------------------------------------

Let's say we want to do a calculation on a 2-bit number.  (In a real 
computer, such problems arise all the time.  On a 32-bit computer, 
we may want to make a calculation on 8-bit values (i.e., 1-byte values, a.k.a.
characters), or 16-bit values (i.e., 2-byte values, a.k.a. short integers).
On a 64-bit computer, we may want want to make calculations on 8-bit values,
16-bit values, or 32-bit values (i.e., 4-byte values, a.k.a. integers).

Problem is, the way our illustrated computer works, it cannot work on less
than 4-bit values because it has four gears and each meshes with memory.

We need to take a detour and talk about alignment.


Alignment
---------

I drew my illustration pretty carefully.  Look again:

                                             +--==    
                  +---------------------+    |
                  |  ||   ||   ||   ||  |    |
                  +---------------------+----+  
                     ||   ||   ||   ||
      =============================================================
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
        --------------------------------------------------------
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
     --------------------------------------------------------------

     ...  2    3      4    5    6    7      8    9   10   11   ...

Below the illustration, I have numbered the gears.  Think of those 
as memory positions.

Notice how the memory occurs in 4-bit groups on this 4-bit computer.
As our computer slides left and right along memory, it can align only 
at certain places.  We can line up our cpu on gear 4, or gear 8, etc., 
but not on, say, gear 3:

                                      +--==    
           +---------------------+    |
           |  ||   ||   ||   ||  |    |
           +---------------------+----+  
              ||   ||   ||   ||
      =============================================================
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
        --------------------------------------------------------
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
     ---------------------------------------------------------------

     ...  2    3      4    5    6    7      8    9   10   11   ...

Above I aligned the left-most gear of our cpu with gear 3 of memory, 
and now the remaining gears do not mesh.  

Modern computers are like that.  It's a rule of construction.


Back to working with numbers shorter than 4 bits
------------------------------------------------

Let's just draw the memory part of our computer:


      =============================================================
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
        --------------------------------------------------------
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
     ---------------------------------------------------------------

     ...  2    3      4    5    6    7      8    9   10   11   ...

Remember, we are going to make a 2-bit calculation.  The two bits, 
without loss of generality, could be at (3,4), or (4,5), or (6,7), 
or (7,8).  

Actually, because of the alignment problem I described above, positions 
(3,4), (5,6), and (7,8) are not a possibility.  Our computer cannot do 
that.  That leaves (4,5) and (6,7).

Let's consider the two cases.


The bits are at gears (6,7)
---------------------------

We align on cpu on bit 4, so that we have:


                                             +--==    
                  +---------------------+    |
                  |  ||   ||   ||   ||  |    |
                  +---------------------+----+  
                     ||   ||   ||   ||
      =============================================================
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
        --------------------------------------------------------
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
     ---------------------------------------------------------------

     ...  2    3      4    5    6    7      8    9   10   11   ...

Then we do the following:

    1.  We turn the crank once.  That loads bits (4,5,6,7) into our cpu.

    2.  We throw a switch on the crank, and turn it again, and 
        that clears the first two gears on our cpu.  

We are then ready to make the calculation.


The bits are at gears (4,5)
---------------------------

We start with the same alignment:  we align on cpu on gear 4:



                                             +--==    
                  +---------------------+    |
                  |  ||   ||   ||   ||  |    |
                  +---------------------+----+  
                     ||   ||   ||   ||
      =============================================================
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
        --------------------------------------------------------
     ... ||   ||     ||   ||   ||   ||     ||   ||   ||   ||   ...
     ---------------------------------------------------------------

     ...  2    3      4    5    6    7      8    9   10   11   ...

Then we do the following:


    1.  We turn the crank once.  That loads bits (4,5,6,7) into our cpu.

    2.  We throw another switch on the crank, and turn the crank again, and 
        that copies gears (4,5) to (5,6).

    3.  We throw the original switch, and then the crank again, and 
        that clears gears 4 and 5.

We are now ready to make the calculation.


There are other cases, too
--------------------------

There are other cases that do not arise on our 4-bit computer doing 2-bit
calculations.  They would arise if we considered 1-bit calculations.  The bit
of interest might not be on the left or on the right, but in the middle.

You can work that one out for yourself.  It requires yet another turn of 
crank.


Summary, working with number shorter than 4 bits
-----------------------------------------------

     R2.  If we have a k-bit wide computer and performing an operation
          on a k-bit number requires n turns of the crank, then 
          performing the operation on an m<k bit number requires 
          n+2, n+3, or n+4 turns of the crank, depending on where the number 
          is located.

          R2 applies only when k/m is an integer > 1.


Surprise!
---------

Holding clock speed constant, 64-bit computers are not faster at everything
than 32-bit computers!  

Say you want to do a calculation on a 32-bit number that requires n turns 
of the crank on a 32-bit computer.  That calculation will require 
n+2 or n+3 or n+4 turns of the crank on a 64-bit computer!

In modern computers, what we want to do can often be done in one or two turns
of the crank.  On a 64-bit computer, the overhead for such calculations is
enormous.

So now let's consider the problem carefully, and let me add some more 
information:

     1.  One reason to have a k-bit computer is because you want 
         to perform calculations on m>=k quantities.

         In the case of 64-bit computers and Stata, that corresponds to 
         to double-precision numbers.  Stata does lots of those 
         calculations.

         64-bit computers will perform calculations faster on 64-bit 
         quantities.

     2.  64-bit computers will perform calculations slower on 32-bit 
         quantities.  32-bit quantities arise all the time in programs.
         They are used, for instance, as loop counters, as indexes, etc.

         However, some 64-bit computers are faster than others when working 
         with short quantities.  In terms of our illustration, I showed 
         a k-bit computer has having its memory organized in k-bit 
         groups.  The computer does not have to be designed that way, 
         although it adds considerable complication to the design if 
         the manufacturer relaxes the constraint.  

         Intel (and AMD) relax it.  This saves lots of turns of the crank
         and alleviates many of the disadvantages of making the computer 
         wider.  

Finally, 64-bit computers can address more memory, but my illustration 
does not demonstrate that.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: on 32/64-bit OS, CPU and Stata
  - From: Nick Winter <[email protected]>
Prev by Date: st: Re: RE: MATCHING METHODS
Next by Date: st: Re: vanelteren
Previous by thread: Re: st: on 32/64-bit OS, CPU and Stata
Next by thread: Re: st: on 32/64-bit OS, CPU and Stata
Index(es):
- Date
- Thread