[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Mata function saving conventions and best practices

From	[email protected] (William Gould, StataCorp LP)
To	[email protected]
Subject	Re: st: Mata function saving conventions and best practices
Date	Tue, 19 Jun 2007 12:10:17 -0500
David Elliott <[email protected]> writes, 

> I've written a number of Mata routines but am unsure of the "best" way
> to compile them to object code. [...]
>
> Since Mata files are compiled into object code as *.mo files (or added
> into a mlib), if one wants to save the source code, there has to be an
> intermediate step of saving and then compiling the source (or
> compiling and saving the command history in the case of an interactive
> session).  In [M-1] source -- Viewing the source code we learn that
> the convention for Mata source is to save as a *.mata file.  [...]
> [...]
> So I'm interested in how others, including Stata, have handled the
> maintenance of source files, compiling to *.mo or *.mlib files.  [...]

Here is what we at Stata do:


1.  We save the mata code in .mata files
----------------------------------------

You can see examples of our .mata files using -viewsource-.  Try

        . viewsource bufio.mata

        . viewsource norm.mata 

        . viewsource mmat_.mata 

I want you to look at them solely for structure.  There are differences, 
but they have the following in common:

        ---------------------------------------- .mata file ---
        *!version #.#.#  <date>
        version 9                     <- or 9.1, or 9.2, or 10

        mata:
        <mata code appears here>
        end
        ---------------------------------------- .mata file ---


Some of the files are short, such as bufio.mata:

        ---------------------------------------- bufio.mata ---
        *! version 1.0.0  29jul2005
        version 9.0
        mata:

        colvector bufio() return(byteorder()\stataversion())

        end
        ---------------------------------------- bufio.mata ---


Most of our files define one function, but there are exceptions.
That's why I included norm.mata and mmat_.mata.  Those files 
define more than one function, but the functions are related.


2.  We use a do-file to compile
-------------------------------

Actually, "we use a do-file to compile" is not literally true, but what we 
do amounts to the same thing, and I'll get to what we really do.  Anyway, 
what we do amounts to 

        --------------------------------- makelib.do ----
        version 9.2                              <- or 10

        capture erase lexample.mlib

        mata: mata clear 
        mata: mata set matastrict on

        do bufio.mata 
        do norm.mata 
        do mmat_.mata

        mata: mata mlib create lexample, dir(.)
        mata: mata mlib add    lexample *(), dir(.)
        --------------------------------- makelib.do ----

Note the following:

    1.  The do-file assumes the .mata code is in the same directory as 
        as the do-file itself.

    2.  The do-file creates the new library in the same directory.
        (It can be copied to the appropriate place after creation.)

    3.  The do-file erases the existing .mlib file first thing.
        This way, if something goes wrong, no library exists

    4.  We compile with -mata: mata set matastrict on-.  You may not want to 
        do that.  If not, change the line to read
        -mata: mata set matastrict off-.  Do not assume a setting.

    5.  Mata is cleared early on.  This way, later, we can refer to *() 
        to mean all the functions in memory.

    6.  On the -mata mlib- commands, we specify opiton -dir(.)- so that 
        the library is created in the current directory.

I draw your attention to (5).  This works with .mlib libraries, but would 
not work with .mo files.  Now I know why David asked yesterday if there 
was a way to get a list of the existing functions in memory.  I gave a
one-word reply, "No.", and I almost wrote, "No, because it is a bad idea", but
now that I see what David had in mind, namely generalizing something like
makelib.do do work with .mo files, I see why he asked and I'm glad I didn't
add, "because it is a bad idea".


3.  Well, we don't really use a do-file
---------------------------------------

Okay, I admit (2) is not exactly what we do, but I still recommend (2).
What we do is a variation on 2.

We have an ado-file that does what I outlined in (2).  The ado-file reads a
control file that lists the .mata files.  In fact, you can see that control
file:

        . viewsource lmatabase.maint

It looks something like this:

        ------------------------------------------------- lmatabase.maint ----
        *! version 1.0.5  05june2006

        * lmatabase.mlib library control file
        *
        * This file lists the *.mata files that are included in lmatabase.mlib

         assert
         asserteq
         e
        <lines omitted>
        bufio
        <lines omitted>
        * end of file
        ------------------------------------------------- lmatabase.maint ----

All the file says is that assert.mata needs to be compiled, and asserteq.mata,
and e.mata, etc.

Our ado-file knows to look in the official places for the individual .mata
files:  UPDATE and BASE.  So our ado-file, reading "assert", executes

        . do /usr/local/stata9/ado/base/a/assert.mata

and later, reading "bufio", executes 

        . do /usr/local/stata9/ado/updates/b/bufio.mata

At least, that's what it does on my Unix computer.  On your Windows 
computer, the lines would refer to assert.mata and bufio.mata in the
appropriate directories.

The ado-file is called lmatabuild.ado, but you don't have that file.
We go to great care to delete lmatabuild.ado from the ado-directories
before shipping Stata.  That is not because lmatabuild.ado contains any 
secrets.  It is because, if you had it, you could overwrite lmatabase.mlib, 
and you don't want to do that, even my accident.  lmatabase.ado is our 
responsibility and you want to be sure that, when you use an official 
Stata command or library, you really are using an official command or 
library.  Then you know who to blame.


4.  How we work
---------------

Let me continue this and tell you how we work.  Let's say I want to update 
bufio.mata because someone has reported a bug.  I start by working 
on my computer:

    1.  I create an new, empty directory.

    2.  I fire up Stata and type "which bufio.mata".  I learn that the 
        official bufio.mata is located in 
        /usr/local/stata9/ado/update/b/bufio.mata on my computer.

    3.  I copy the file to my current directory.

    4.  I edit the file in the my current directory.

    5.  In Stata, I type "adopath ++ ."  That tells Stata that 
        materials in my current directory take precedence over
        everything, including all of Stata's official stuff.

    6.  I use -lmatabuild- to rebuild the library.   Remember, 
        -lmatabuild- creates lmatabase.mlib in the current directory.

    7.  I type -mata: mata mlib index-.  That tells Mata to search 
        for .mlib files.  Now lmatabase.mlib in my current dirctory 
        is the one Stata will use.

    8.  I interactively test -bufio()- and convince myself that I have 
        fixed the bug.

After that, there is more I have to do.  I have to write a test script proving
the bug is fixed, and then I have to turn in the updated bufio.mata file and
the test script.  At StataCorp, someone else will then add the files to
the materials for the next official release, run full certification, and
eventually, before it reaches you, full certification will be run on every
platform Stata supports, using every version of Stata, from Small to /MP.


5.  Ideas
---------

Let's assume you wish to create and maintain Mata library lme.mlib.
There are ideas worth borrowing from us.  

Let me assume:

     1.  You have a directory that contains lme.mlib .mata files.
         You consider this directory your "official" copy of lme.mlib.

     2.  You have written a do-file called makelme.do that will read 
         the .mata files and create lme.mlib.

     3.  You have copied lme.mlib to your PERSONAL ado-file directory 
         so that you can use the lme.mlib materials whenever you use 
         Stata.

     4.  Perhaps the directory also contains test materials.  You 
         might have file test.do that tests the lme.mlib functions.
         test.do might itself be simple, calling other do-files
         to do the actual testing.  For instance, let's pretend 
         lme.mlib contains functions -me1()-, -meinv()-, and -favorite()-.
         test.do might call do-files testme1.do, testmeinv.do, and 
         testfavorite.do.  test.do might read 

                ------------------------------------ test.do ----
                version 9
                clear 
                adopath ++ . 
                mata: mlib index

                do testme1
                do testmeinv
                do testfavorite
                ------------------------------------ test.do ----

          This way, anytime you want, you can rerun the tests.  You 
          can run them individually, perhaps interactively, and you 
          can run them as a whole. 
 

Then here is one way you might work when you want to update lme.mlib:

     1.  You create a new directory.

     2.  You copy everything from your official lme.mlib directory to 
         the new directory.

     3.  You make whatever changes you want to make.

     4.  You test, perhaps interactively, and if you have test.do, 
         you run that, too.  Perhaps you even add to the test.do 
         suite.  

     5.  Satisified, you carefully copy everything back to your 
         "official" directory.  (* I have comments on this.)

     6.  You put the new lme.mlib file in PERSONAL.

     7.  You erase the new directory you created.

Let me emphasize that there might be days between steps (1) and (7).  It is 
very important that you keep your original in a safe place and never 
work on it there.  Work elsewhere and, once finished, copy back.

Concerning copying back (step 5), I heartily recommend the Unix 
commands -dircmp- and -diff-.  You can get similar utilities for Windows.
Unix and Macintosh computers have these commands.  -dircmp- compares two
directories and tells you which files differ between them.  Finding the actual
differences between your official directory and your working directory is much
better then relying on memory.  -diff- will compare two files and show you,
line-by-line, how they differ.  I recommend doing this on every file -dircmp-
identifies.

I also recommend keeping backups.

Concerning -dircmp- and -diff- for Windows, perhaps someone ont the list knows
more than I.  At StataCorp, we use Cygwin, which can be obtained for free from
http://www.cygwin.com.  We heartily recommend it but it provides a full
Unix-like environment and that may be more than you want.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: Mata function saving conventions and best practices
  - From: "David Elliott" <[email protected]>
- Re: st: Mata function saving conventions and best practices
  - From: David Kantor <[email protected]>
Prev by Date: RE: st: RE: random values
Next by Date: RE: st: creating networking effect, by usingbirthstate of migrant
Previous by thread: st: Mata function saving conventions and best practices
Next by thread: Re: st: Mata function saving conventions and best practices
Index(es):
- Date
- Thread