Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: global variables vs program arguments in Mata


From   [email protected] (William Gould, Stata)
To   [email protected]
Subject   Re: st: global variables vs program arguments in Mata
Date   Tue, 05 Dec 2006 08:52:24 -0600

Zurab Sajaia <[email protected]> is wondering about "best practice, in terms
of efficiency" in a particular Mata programming problem:

> I'm writing program in Mata where some initial values are defined at the 
> begining and then a subroutine recursively itterates on them.
> Currently it looks something like:
>
>       void sub(a , b)
>       {
>               some operations with a and b
>               ...
>               sub(a, b)
>       }
>
>       void main()
>       {
>               a=...
>               b=...
>               sub(a, b)
>       }

Zurab notes that parameters a and b are essentially global-like, and 
wonders in particular if it would be better if a and b were coded as 
real globals.

Zurab has already coded it right in my opinion.

Zurab is right in suspecting that there is a tradeoff between how he 
has done it and the global-variable approach.  What Zurab has done will 
run faster.  Using globals would use less memory.

To explain, Zurab is asking whether he should restructure his program 
to read 

          void main()
          {
                 external a, b

                 a = ...
                 b = ...
                 sub()
          }

         void sub()
         {
                 external a, b

                 some operations with a and b
                 ...
                 sub()
        }

This will run slower because the -external- statement is timewise expensive.
What Mata compiles this version of sub(), Mata makes a note in the executable
to find global variables named "a" and "b" because it is the memory addresses
of a and b that Mata needs.  The result of this is that, every time sub() is
executed, it executes that note.  sub() goes off looking into Stata's and
Mata's memory for the Mata global named "a" and gets its address, and then
does the same for "b".  To parallel a joke, that takes a long time in dog
years, it takes a long time in computer years, too.

In Zurab's original, sub() has arguments a and b.  Those are the addresses 
of a and b.  Thus, in the original, the name lookup is skipped.

That's why Zurab's original will run faster. 

Zurab's original uses more memory, however, and let me show why that is.
When you call a program that takes two arguments, Mata and C essentially 
do the same thing.  They "push the two arguments onto the stack" and then 
transfer control to sub().  When sub() returns, they "pop the two arguments
off the stack".

The "stack" is a list of values.  In Mata's case, those values are memory 
addresses, so if you use a 32-bit computer, each stack element requires 
32 bits (4 bytes), and if you use a 64-bit ocmputer, each stack element 
requires 64 bits (8 Bytes).  The stack is organized First In Last Out 
(FILO), and adding elements to the stack is called "pushing" because not
only is the value copied, but the stack itself grows.  It grows in Mata's
case by 4 or 8 bytes.  Removing elements from the stack is called "popping"
because the stack gets smaller.

So let's imagine Zurab's sub() routine calls itself 1,000 times.   
The 1,000-th time it calls itself, the stack will have 1,000*2 = 2,000 
elements on it.  Say Zurab is using a 32-bit computer.  Then the stack 
will be consuming 4*2,000 = 8,000 bytes.  It will be consuming 16,000
bytes if Zurab uses a 64-bit computer.  

In my opinion, the speed issue trumps the memory issue, at least if the 
number of recursive calls is around the thousands.  Now if Zurab told me
that his subroutine was going to call itself a 3 million times, I might change
my opinion.  The excess memory consumption would grow to 12 or 24 megs.
Of course, that would be the case where Zurab would be most intertested in 
speed, and so I would ask Zurab to consider carefully whether he could 
afford the memory.

In general it is better to pass values as arguments.  In fact, if Zurab had
some reason where he wanted a and b to be global for some other reason, I
would recommend he write his code as follows:

          void main()
          {
                 external a, b

                 a = ...
                 b = ...
                 sub()
          }

         void sub(a , b)
         {
                 some operations with a and b
                 ...
                 sub(a, b)
        }

In this case, main() performs the global lookup, once, and sub() receives
the addresses of the globals.  From sub()'s persepective, the fact that 
a and b are global is irrelevant.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index