Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to define an external Mata class within the namespace of an ado-file


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to define an external Mata class within the namespace of an ado-file
Date   Mon, 13 Dec 2010 12:14:09 -0600

Joseph Coveney <jcoveney@bigplanet.com> asks about ado-files that make 
use of a parent (base) Mata classes, and wanting to allow the user 
to extend the class without have to modify the original ado-file.
Joseph writes, 

> The problem is that the ado-file's Mata code is run within the namespace of
> the ado-file:  the parent/base class is not visible to the external Mata
> code, and so the external Mata file cannot extend it.
>
> My workaround so far is to use -include- within the body of the ado-file's
> ado program code, as shown below (it's simplified).  This seems to work, but
> the help file for -include- admonishes against this: see -help
> include##remarks3- .

The relevant part of Joseph's ado-file reads, 


-------------------------------------------- gjps.ado ------
        program define gjps, eclass
           version 11.1
           syntax /* . . . */, [EXTernal(string)]

           if "`external'" != "" include `"`external'"'

           mata:Controller::fit()
        end

[...]
-------------------------------------------- gjps.ado ------

There are potential problems with this approach.

I don't think the problem mentioned in the help file is an issue.  The help
file emphasized that -include- will be executed at the time it is executed --
known as run time -- rather than when it is loaded -- known as load time or
compile time.  In this case, Joseph knows that and, in fact, that is exactly
what he wants.

Problems may arise, however, if the program -gjps- is re-executed, because the
-include `"`external'"'- will be re-executed, and this could lead to problems.
First, this will be taxing to Stata's memory management routines, although I
think they will handle the situation correctly.  I am worried that when the
time comes to remove gjps.ado from memory, the memory-management routines may
become confused if -gjps- has been run repeatedly.  If they do become
confused, they will err on not dropping everything they should.  Thus my
concern is merely that a little memory might not be free'd.  If that were to
occur, I doubt anyone would notice.  The memory would certainly be free'd the
name time someone did a -clear all-.

The second issue has to do with what would happen if -gjps- held on to values 
between being invoked by separate -gjps- commands, 
which I do not think is the case here.  Imagine a command like 
-ml-, where the command is given at one stage to set up the problem, and then 
later to provide more setup, and then finally again later to execute the 
problem.  Obviously, command -ml- is holding on to information between
invocations; In this case, there would be an object that is being maintained
between calls.  If the definition of the object were changing because 
the the code re-executed the definition instructions, problems could
arise, and I will explain how below.  In some cases, it is important that the
-include- statement be executed only once per problem.

Anyway, the right way to handle all of the above problems is that the class
definition be split out from the ado-file and that the class definition be
stored separately, as a global, in compiled form, in a library.  This will
severe the inappropriate connection between the memory management of the
ado-file and (possibly user-extended) class.

With the class defined in a library, the user can extend the class even before
using -gjps-, or after, and there will never be confusion.  The class will not
be in the name space of the ado-file -- it will be global -- but I don't see
that it will matter and, as I just hinted, I see that as an advantage.  In
terms of naming convention, I would suggest the class be given the same name
as the ado-file, namely -gjps-, and that subclasses all begin with -gjps_-.
Thus, no extra or unexpected name is chosen.

I may not be understanding what Joseph is asking, and/or it may be that what I
am suggesting is overkill.  I'm guessing that when Joseph is talking about
extended the class, he is talking about virtual functions.  That is, I'm
guessing that Joseph is imgagining calling a function of class Likelihood that
actually results in a function of subclass being executed.  Said funciton
might be of class Logistic which extends class Likelihood, or even of class
somethingelse that extends class Logistic (that extends class Likelihood).  Do
I have that right?  If I do, then I am suggest the classes be named globally,
with the names gjps and gjps_logisitc.  The user could call the next subclass
whatever they wish, and it would extend gjps or gjps_logistic.

I hope this is helpful.

In case it is not, let me back up and explain exactly what -include- 
does, and lots more besides.

-include- is the same as the -do- and -run- commands.  FYI, -do- and -run-
are, deep inside Stata, the same command.  They differ merely because on is
noisy and other is quiet, and that is handled by an argument.  -include- is
also the same command as -do-/-run-, the difference being that the
current value of the quietly/noisily flag is passed to the internal routine.

All three commands -- -do-, -run-, and -include- -- are really the same
command.  There is nothing special about any one of them.  What I want to
empahsize in this discussion is that -do-/-run-/-include- are run-time
commands of Stata, not load-time commands of Stata.  I want to explain all the
impliations of that statement.

Ado-file memory management (the automatic loading and clearing of memory
associated with an ado-file) is based on actions that occur at the time the
ado-file is loaded.  Ado-file execution (as opposed to loading) as nothing to
do with this automatic memory management, except that ado-file execution
resets the timer that controls when the memory used to store ado-file
definitions is cleared and discarded.  If the timer ever should go off, and
thus the ado-file cleared, then the next time the ado-file is run, the
ado-file is run through the automatic-load procedure again.

Let me explain the process in detail, so in case I haven't answered Joseph's
question, he can answer it for himself.

Pretend the user types 

        . gjps ...

Here is what happens:

       if (!exists(program gjps) {
            find file gjps.ado
            open for write a new internal memory area called gjps
            /* All definitions will now be stored in gjps rather than 
               the usual global memory area
            */
            capture noisily run gjps.ado 
            if (_rc!=0) {
                  close memory area gjps 
                  delete memory area gjps
                  exit(_rc)
            }
            close memory area gjps
       }
       open for read memory area gsps
       /* All references to external definitions will now be found 
          in gjps.  If definiitions are are not found there, Stata
          looks in the global area
       */
       capture noisily execute program gjps
       close memory area gjps
       exit(_rc)

Note that the private memory area contains only definitions.  Definitions
include Stata -program-s, Mata -function-s, and definitions of Mata -struct-s
-class-es.  Definitions do not include that x happens to take on the value 3.
That x is a such-and-such and takes on the value 3 is called an instance, 
not a definition.

The private memory area described in the code above is not used to record
instances.  I'm going to show you how instances are stored, but first I 
want to be sure you are clear on the distinctions between a definition 
and an instance.

Examples of instances include the names and values of Stata's macros, the
names and values of Mata's built-in types such as -real scalar-, and the names
of values of Mata's structures and classes.  For instance, when you code

        local x = 3

You are creating an instance of local macro, and storing 3 in it.  What is
meant by "local macro" is called a defintion, in this case, "local macro"'s
definition.  That x is a local macro and is equal to 3 is called its instance.

Thes distinction between definition and instance is more important, and more
obvious, in the case of classes and structures.  Consider x which is a class
Likelihood.  What is meant by class Likelihood is contained in class
Likelihood's definition.  That x is a class Likelihood and the values taken on
by all of its different elements is called x's instance.

Definitions and instances are stored separately and managed separately.

In the above code that loaded an ado-file, I showed you how definitions 
are created and stored.  I didn't show you how they are deleted, but 
I implied it:  When the memory area's timer goes off, the memory area is 
free'd.

Instances, on the other hand, are created, stored, used, and destroyed by the
code that implements Stata's -do-, -run-, -include-, and execute (execute does
not have dashes around it because it is not a command of Stata, it is a
component of Stata, which I used in the code fragment above).

      function stata_do/run/include/execute()
      {
           set noisily/quietly
           open a read/write memory area for instances
           low_level_execute whatever we've been told to execute
           close read/write memory area for instances
           delete read/write memory area for instances
           reset noisily/quietly
           return(return code for low_level_execute)
      }

Thus, instances exist only as long as the ado-file is actually running.
Instances are always, without exception, destroyed when the ado-file 
completes.  

If you think carefully through all of the above, you will understand the 
basis for my earlier comments that definitions might stack up and not 
be cleared when the ado-file is cleared.  Actually, this is a case 
where you might be confused, but Stata would not be. Deifnitions are 
stored separately from instances.  There is only one time when definitions 
can be added to the private area of an ado-file, and that is when the ado-file
is loaded.  The way ado-file work when they are executed, private definitions
take precedence over global definitions.  If something is defined only 
globally, however, that is fine, and it is used.  Now consider the case 
where something is defined privately and globally, say by the user 
including an option on the -gjps- command.  The private definition still 
takes precendence.  This will certainly lead to confusion.  Moreover, 
when code outside the system is executed, it's a different set of 
private definitions that are in effect.  Thus, it is possible that 
the outside-the-system code will use a different definition for what you 
assume are the same thing.

In simple, straight forward cases, when Joseph suggests will work. In 
complicated cases, it is a recipe for confusion.

And that is why I suggested that the class be made global and its definition 
stored in a global library.  This way, no matter when, where, or how 
the user extends the definitions, there will be no confusion.  
The user, the programmer, and all routines, will be using the same set 
of global definitions.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index