[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Mata "no room to add more symbols" error |

Date |
Thu, 18 Jun 2009 08:59:57 -0500 |

John Bates <lirac271@yahoo.com> asks, > I am running into a weird problem with Mata. Is there a limit to how many > variables you can declare inside a function? Yes. The current limit is 200. John goes on to explain, > When I now add the following line to the top of my main function: > > real scalar newest_var > > I get the following error when I try to compile (the error occurs a few > hundred lines down after the declaration): > > no room to add more symbols > r(3000); John may wonder whether, because the error occurred a few hundred lines after the declaration, there is something else going on. There is not. John mentioned that eliminating one variable in his big program "resolves the problem", but went on to note, "I do not get an error if I add additional variables in my other (much smaller) functions". John is obviously surprised at that. That is because each function is compiled separately. Mata is a compiler. The term "compiler" is loaded with meaning. Compilers translate one computer language to another. That Mata compiler translates Mata source code to numeric codes that later can be executed very rapidly. In the process of performing the compilation, Mata must make a list of the varaibles and functions used inside the program and assign to each a physical address in memory. That list is called a symbol table. The symbol table is used during the compile process so that each time you refer to variable alpha, address 0x0483998 is used, and each time you refer to beta, and so on. There's other information in the symbol table that Mata needs, too, but it's important to understand that all that information is needed when the function is being compiled. Once the compiled program is generated, the symbol table is thrown away. Anyway, we arbitrarily made the size of that table 200. We choose 200 because we thought 200 would be large enough that no one would every encounter the limit, or ask about it. Evidently, the limit needs to be increased. That, however, cannot happen quickly enough to solve John's problem, so let me offer a work around that I hope will be taken as also being good advice: Write short programs with that call lots of subroutines. Stata is a large program containing over one million lines of code. Yet, in all that code, there is not one program that declares more than 50 variables, and most declare around 8. How does that work? The beginning part of Stata code, were it written in Mata, looks like this: void stata() { struct instance inst initialize_struct(inst) initialize_stata(inst) run(inst) shutdown(inst) } void run(struct instance inst) { real scalar finsihed for (finished=0; !finished;) { get_line_from_current_input(inst) finshed = xeq_line(inst) } } I admit that I've omitted some details from the routines above, but I'm omitting a lot fewer details that most people would guess. There are only another 20 or 50 lines in each. The initialize_stata(inst) routine looks like this: void initialize_stata(struct instance inst) { initialization_part1(inst) initialization_part2(inst) } Routine initialization_part1(inst) handles the low levels details of launching Stata. By the time it finishes, we have a screen and we have the ability to output to it, something we didn't have before initialization_part1() ran. Stata itself is still largely unborn. We haven't even put out the openning message yet. I admit that initilization_part1() is a longer routine than any I've shown so far. It contains 50, maybe 100 lines. And it, too, has subroutines -- lots of them -- because initialization_part1() is a complicated process. Each subroutine, however, is short and handles a particular aspect of problem. I use Stata as an example just because it is so big a system and even so, Stata never comes close to needing 200 variables in any one routine. The numeric-code components of Stata such as matrix inverters, linear regression solvers, etc., do tend to be longer and to use more variables simultaneously. As I mentioned, we get up to 50. I say 50, but I admit I haven't looked, yet I feel comfortable making the claim. I can certainly think of routines with 20 variables. There might be one with 30 or 40, so I said 50. My point is that you can live within the 200 limit and that if you do that, you will actually find your code easier to write and easier to maintain. Well, you'll find your code easier to write only after you get through the rewrite of you big routine. That said, we will increase the symbol-table size. I expect we will have that out in July. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: growth curve model with weights** - Next by Date:
**st: Stats for model fit - metareg** - Previous by thread:
**st: Mata "no room to add more symbols" error** - Next by thread:
**st: dropping one unit at a time** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |