Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: How strict is -set matastrict-?


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How strict is -set matastrict-?
Date   Wed, 24 May 2006 08:56:40 -0500

Joseph Coveney <jcoveney@bigplanet.com> asks, 

> I noticed that in Mata you can willy-nilly change the organizational type of
> a declared variable, including one that is not declared a transmorphic
> matrix, and even with -mata set matastrict on- [...]  It does lend
> some flexibility, but I'm surprised that Mata lets you do it.  Is the
> philosophy behind this covered in the user's manual?  I haven't run across
> it.  Is it a why-not philosophy?
>
> Also, what is the intention of having three distinct vector organizational
> type definitions (vector, rowvector and colvector)?  [...]

There are three questions here:  (1) willy-nilly change of organization type, 
(2) matastrict, and (3) vector, rowvector, and colvector.


Willy-nilly change of organizational type
-----------------------------------------

Joseph is asking about orgtypes.  A Mata type (such as -real scalar-) is 
made of an eltype and an orgtype which, together, are referred to as the 
variable's type.

Let me begin by talking about types.  Up until recently, Mata was pretty
tolerant about willy-nilly changes in eltypes.  The more recent updates are
less tolerant of such changes than previous versions:

        : function example(y)
        > {
        >         string scalar   x
        >
        >         x = rows(y)
        type mismatch:  string = real not allowed

The above is with -matastrict- on or off.

Older Matas would have compiled the above without error or comment.  The code
would have executed correctly.  The new compiler pays more attention and
thereby produces more efficient code.

Future versions may be even more intolerant.  

The more recent versions of Mata are more restrictive concerning eltypes.  
They are still pretty relaxed about orgtypes.  You can imagine how paying
attention to eltypes could be used to produce more efficient code.  Concerning
orgtypes, there is not as much information that can be exploited, except to
distinguish between scalar and nonscalar.

The compiler enforces what it needs.  Those needs will become more restrictive
over time.


-matastrict-
------------

-matastrict- makes the compiler "strict".  You are required to declare 
the variables you use.

The purpose of -matastrict- is *NOT* to flag constructs (1) that may be 
questionable or (2) that may case the compiler to produce less efficient 
code.  Those warning occur whether -matastrict- is on or off:

        : function bill()
        > {
        >         real scalar     x
        >
        >         secondfcn(x)
        >         x = 3
        > }
        note: variable x may be used before set

The above code is probably, but not certainly, incorrect.  Hence, Mata 
issues a warning message; it does not refuse to compile the function.

The purpose of -matastrict- is to refuse to compile your function if you fail
to meet some formal rule.  

At StataCorp, we love formal rules.  We have discovered that obeying them
reduces the chances of bugs.  Thus, we may add more rules to the
declare-your-variable rule and have -matastrict- enforce them in the future.
Right now, however, we have nothing in mind, and I cannot imagine what
those new rules would be.

Joseph might suggest willy-nilly changing of organizational types.  That would
not be a bad idea.  The idea is so good, however, that I would argue Mata
should flag it as an error whether -matastrict- is on or off.  Moreover, by
the time we get around to flagging such questionable constructs, we at
StataCorp will want to exploit the information to produce even more efficient
code, and so then we will have to flag it as an error.

No such change is in the works, nor will it happen soon.  Such constructs are
devilish difficult to detect.  Note that it is not necessarily an error to
assign a colvector to a rowvector:

                x = somefcn(y)

In the above, assume x is a rowvector and somefcn() returns a colvector.
The colvector returned might be 1x1 (a scalar), and that meets the definition
of both a rowvector and colvector.  Joseph's example is easier for the eye to
spot:

                A = (1, 2, 3)
                A = A'

but still difficult for the computer to detect.  A=A' is not necessarily an 
error, it's just unnecessary if A is declared as a rowvector.  What makes the
statement an error are the two statements taken together.

In any case, the one check currently performed by -matastrict- is worth a lot.
It is too easy to type -d- when you mean -s- (they are next to each other on
the keyboard), etc.


Vectors
-------

Mata provides three vector organizational types:  -rowvector-, -colvector-,
and -vector-.  The last just means -rowvector- or -colvector-.

Say you wish to write a subroutine to return the length (geometric definition)
of a vector:

        real scalar veclen(real vector x)
        {
                real scalar        i, sum

                for (i=1; i<=length(x); i++) sum = sum + x[i]^2
                return(sqrt(sum))
        }

Note that I declared veclen() to take a -real vector-.  I could have declared
veclen() to take a -real colvector-, or a -real rowvector-, but declaring
-real vector- is better.

In my application, I may intend to use the subroutine with a -rowvector-, but
that is not an argument to make -veclen()- restrictive.  Why should I, the day
I wish to use veclen() in another application, have to code

                length = veclen(v')

I would have to do that if I coded -veclen- to take a -real rowvector- today
and want to use it with a -real colvector- tomorrow.

Thus, -vector- used in argument declarations is a way of communicating that
the requirement is merely that the argument be a vector; function veclen()
can be used with -rowvector-s or -colvector-s.

In another function, I may wish to take a -vector- and, someplace in the 
code, need to temporarily keep a copy:

        function ...(..., vector v, ...) 
        {
                ...
                vector        copy
                ...

                ...
                copy = v
                ...
        }

Thus, -vector- used in variable declarations arise because of -vector- in 
argument declarations.

The way I have written this, it seems that -vector- is just about
documentation.  That's not true.  I remind you, having to code

                length = veclen(v')

would be inefficient.  Using -vector- appropriately allows you 
to write more efficient code.  From the Mata compiler point of view, all
vectors are equally efficient because they are all addressed the same way.  It
is just a matter of interpretation, and one does not want that interpretation
to lead to extra work.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index