Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Mata "no room to add more symbols" error


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Mata "no room to add more symbols" error
Date   Thu, 18 Jun 2009 08:59:57 -0500

John Bates <lirac271@yahoo.com> asks, 

> I am running into a weird problem with Mata.  Is there a limit to how many
> variables you can declare inside a function?

Yes.  The current limit is 200.

John goes on to explain, 

> When I now add the following line to the top of my main function:
> 
>         real scalar newest_var
> 
> I get the following error when I try to compile (the error occurs a few
> hundred lines down after the declaration):
>
> no room to add more symbols
> r(3000);

John may wonder whether, because the error occurred a few hundred lines 
after the declaration, there is something else going on.  There is not.

John mentioned that eliminating one variable in his big program "resolves the
problem", but went on to note, "I do not get an error if I add additional
variables in my other (much smaller) functions".  John is obviously surprised
at that.  That is because each function is compiled separately.

Mata is a compiler.  The term "compiler" is loaded with meaning.  Compilers
translate one computer language to another.  That Mata compiler translates
Mata source code to numeric codes that later can be executed very rapidly.  In
the process of performing the compilation, Mata must make a list of the 
varaibles and functions used inside the program and assign to each a physical
address in memory.  That list is called a symbol table.  The symbol table is
used during the compile process so that each time you refer to variable alpha,
address 0x0483998 is used, and each time you refer to beta, and so on.
There's other information in the symbol table that Mata needs, too, but it's
important to understand that all that information is needed when the function
is being compiled.  Once the compiled program is generated, the symbol table
is thrown away.  

Anyway, we arbitrarily made the size of that table 200.  We choose 200 
because we thought 200 would be large enough that no one would every 
encounter the limit, or ask about it.  Evidently, the limit needs to be 
increased.

That, however, cannot happen quickly enough to solve John's problem, so 
let me offer a work around that I hope will be taken as also being 
good advice:  Write short programs with that call lots of subroutines.

Stata is a large program containing over one million lines of code. 
Yet, in all that code, there is not one program that declares more than 
50 variables, and most declare around 8.  How does that work?
The beginning part of Stata code, were it written in Mata, looks like this:

       void stata()
       {
              struct instance inst

              initialize_struct(inst)

              initialize_stata(inst)
              run(inst)
              shutdown(inst)
       }

       void run(struct instance inst)
       {
              real scalar   finsihed

              for (finished=0; !finished;) {
                     get_line_from_current_input(inst)
                     finshed = xeq_line(inst)
              }
       }

I admit that I've omitted some details from the routines above, but I'm
omitting a lot fewer details that most people would guess.  There are only
another 20 or 50 lines in each.

The initialize_stata(inst) routine looks like this:

       void initialize_stata(struct instance inst)
       {
              initialization_part1(inst)
              initialization_part2(inst)
       }

Routine initialization_part1(inst) handles the low levels details of 
launching Stata.  By the time it finishes, we have a screen and we have 
the ability to output to it, something we didn't have before 
initialization_part1() ran.  Stata itself is still largely unborn.
We haven't even put out the openning message yet.

I admit that initilization_part1() is a longer routine than any I've shown so
far.  It contains 50, maybe 100 lines.  And it, too, has subroutines -- lots
of them -- because initialization_part1() is a complicated process.  Each
subroutine, however, is short and handles a particular aspect of problem.

I use Stata as an example just because it is so big a system and even so,
Stata never comes close to needing 200 variables in any one routine.

The numeric-code components of Stata such as matrix inverters, linear 
regression solvers, etc., do tend to be longer and to use more 
variables simultaneously.  As I mentioned, we get up to 50.  I say 50, but I
admit I haven't looked, yet I feel comfortable making the claim.  I can
certainly think of routines with 20 variables.  There might be one with 30 or
40, so I said 50.

My point is that you can live within the 200 limit and that if you do 
that, you will actually find your code easier to write and easier to 
maintain.  Well, you'll find your code easier to write only after 
you get through the rewrite of you big routine.  

That said, we will increase the symbol-table size.  I expect we will have 
that out in July.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index