Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Views = pass by reference?


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Views = pass by reference?
Date   Fri, 12 Sep 2008 11:42:53 -0500

Ashim Kapoor <ashimkapoor@gmail.com> asks, 

> Are views in Mata something like a pass by reference in C++? 
> Is that the reason why they work faster?

No.  

Ashim asks two questions, and each below has its own answer.  You 
can read one without the other.


Question 1:  Pass by reference?  What does that mean?
-----------------------------------------------------

Forget Stata and Mata for the minute, and let's just think about computer
languages in general.  Assume the computer language provides functions and
let's think about how the concept f(x,y) might be implemented on a modern 
digital computer.

Modern computers are stack oriented like the old HP calculator.
Do you remember the old HP?  To calculate 2+3, you keyed 

       2  LOAD   3   +

and what happened was the stack started empty:

             -------------------
             -------------------

You pressed 2 LOAD and the the number 2 was pushed onto the stack,

             -------------------
             2
             -------------------

You pressed 3 and the number 3 was pushed onto the stack, 

             -------------------
             2
             3
             -------------------

Finally you pressed + and the top two elements are the stack were summed, 
the two original elements popped off the stack, and the sum pushed back 
on, resulting in 

             -------------------
             5
             -------------------

Now, if you wanted to calculate (2+3)*7, you would just continue, by 
keying 7 LOAD *.

Modern computers work just like that and, in most modern languages, 
f(x,y) is implemented as a push of x, a push of y, and then the calling 
of the code for f().  The function f() uses the top two elements of the 
stack to calculate f(x,y), and then pops the two originals, and pushes 
the result.

In the case of the the hand calculate, it's pretty obvious what push 
means:  it means "push the value entered".  In a modern computer, 
the are two possible meanings for push:  (1) "Push the value entered", 
just as with a hand calculator, or (2) "Push the address where the value 
entered is stored".

On modern computers, we have memory where we can store values, and memory 
locations are numbered 0, 1, ..., etc.  If you have a 64 megabyte computer, 
the top memory position is 67,108,863 (i.e., 64*1024*1024-1).  

So let's consider f(3,4).  Design 1 dictates we push 3 onto the stack, push
4 onto the stack, and transfer control to the address where the code for 
f() is stored.  Design 2 says we push the address in memory where the 3 
is stored, push the address where 4 is stored, and, just as in Design 1, 
transfer control to the address where the code for f() is stored.

Why on earth on a computer designer want to push addresses rather than the
values themselves?  Well, let's consider the case where x and y in f(x,y) are
allowed to be matrices.  Let's say x is 40x40.  If we push values themseleves,
then in the case of x, we will need to push 1,600 values onto the stack, in
effect making an entire copy of the matrix!  The same logic will apply to y.
If we push addresses, however, we still only push two things on the stack:
the address where x is stored, and the address where y is stored.

Anyway, "Pass by reference" (a.k.a. "Pass by address", "Pass by name")
is what Design 2 is called.  Design 1 is known as "Pass by value".

The Fortran porgramming language uses "Pass by reference", although 
back when Fortran was designed, it was called "Pass by address".  
The C programming language uses a combination of "Pass by reference" 
and "Pass by address":  For integers and characters, C uses "Pass 
by value".  For floats and doubles, which is used depends on the 
age of the C compiler, whether the computer is 32 or 64 bit, and is 
further complicated by how the numerical coprocessor is designed.
As a general rule, think of floats and doubles as being passed by 
value.  In the case of C, an overriding rule is that arrays are always
passed by reference.

Mata may look like C, but in terms of argument passing, Mata is like 
Fortran.  Mata always uses pass by reference, and in fact uses the 
"Pass by address" flavor of "Pass by reference", which is merely to 
say, the definition of a reference is Mata is a fully resolved address.

Each method has its advantages and disadvantages.  Pass by reference uses less
memory and computer time when arguments are aggregates, such as arrays.  Pass
by value uses less memory and computer time when arguments are scalars.  Pass
by value allows functions to modify arguments with changing the underlying
value of the caller's code.  I say that in a positive way, but I can say the
opposite in a positive way, too:  Pass by reference allows the functions to
modify arguments and thus change underlying values in the caller's code, so
more than one result can be returned.

Anyway, Mata uses "Pass by reference" in ALL cases, scalars, views, and
nonviews.


Question 2:  So what makes views faster?
----------------------------------------

First, I must correct you.  Views are not necessarily faster than nonviews.
Views are faster in some cases, and slower in others, and views are more 
often slower than faster.  What makes views interesting is that they use 
less memory.  That can be of great importance when dealing with large 
datasets.

Nonviews are easy enough to understand.  Consider a 3 x 4 matrix.  The
elements of the matrix are lined up one after the other, rowwise, in memory.
Think of a matrix as a long vector (x11, x12, x13, x14, x21, x22, ...x24, 
x31, ..., x34).  If you, in your code refer to x[i,j], the appropriate 
element can be found by selecting the (i*4+j)th element of that vector. 
Assume elements are of length 8 (they are) and the matrix is stored at 
address 100 (it isn't).  Then 100+(i*4+j)*8 takes us right to where we
need to be.  A simple linear calculation which can be made very quickly
is all that is required to access the desired element.

Views, on the other hand, are in fact complicated multipart tables so that, 
if you code x[2,3], Mata can reasonably quickly figure out you want
Stata variable 14, observation 29.  Then Mata calls Stata to get the value.

So let's review.  We want to access x[i,j]:

                                                       Execution time 
                                                       ---------------
     Nonviews:
         1.  Calculate 100*(i*4+j)*8                    t  (t small)
         2.  Access memory value                        e  (approx. 0)
         --------------------------------------------------------------
             Total                                     t+e
          
      Views:
        1.  Translate i,j into var. #, obs. #           T  (T >> t)
        2.  Call Stata to get value                     E  (E >> e)
        ---------------------------------------------------------------
            Total                                      T+E (T+E >> t+e)


First point:  Why use views?  Because they save memory at the cost of 
execution time.

Second point:  The TOTAL cost is not as great as you are now thinking.
In fact, there are cases where the TOTAL cost is less for Views than 
for Nonviews.

That nonview matrix did not come from nowhere.  The very fact that 
we are comparing Views and Nonviews tells us that the nonview matrix 
came from Stata data.  Ergo, when the nonview matrix was created, the
values so nicely lined up in one location after another had to be obtained
from Stata.  The cost of obtaining those values was between R*C*E and
R*C*(E+T), where R=# of rows, C=# of cols.  The time cost was certainly
greater than R*C*E, because E is just the time Stata needs given a 
var. # and obs. #, and we had to figure out which var. # and 
obs. # we wanted.  How long it took to do that is certainly less than T,
however, because we could proceed sequentially in the determination.  
Thus,

     Nonview setup time:
           R*C*(E+u),   0 < u < T

The implication of the above is that views are faster than nonviews when we
need only to access a subset of the data.  If we are going to access all the
elements, as long as we access them just once, the extra time cost of a view
is R*C*(T-u).  It turns out that u is much closer to T than it is to 0, so
R*C*(T-u) is small.

Views become costly in execution time relative to nonviews when the elements 
will be accessed repeatedly. 

I suppose I should apologize to the majority of listers for having bored 
them with a long explanation just so I could say, "Views are faster than 
nonviews if you are only going to access a subset of the elements, and  
Views are nearly as fast as Nonviews if you access the elements only once."

On the other hand, if you enjoyed these details, I recommend you
consider applying to StataCorp for employment.  For the bored nonprogrammers, 
let me explain that writing something like above after code is written is a 
a simple exercise.  We write things like the above BEFORE code is written,
the point being predict behavior and thereby compare approaches before 
undertaking the expensive step of actually writing the code.


-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index