Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Mata question


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Mata question
Date   Thu, 20 Apr 2006 10:10:07 -0500

I just answered the question

> Is there a Mata equivalent to Stata's -capture- statement?

from Daniel Hoechle <daniel.hoechle@gmail.com>, and I now see that 
Benn Jann <ben.jann@soz.gess.ethz.ch> chimed in, 

> I would be interested in that, too.

In case you haven't noticed, Ben is pretty proficient with Mata and is 
busy writing functions for all of us to use.  So I would like to write a 
second answer aimed at function writers on how functions should behave.
This will be of interest too to function consumers, because it will 
reveal what one can expect when approaching a new function for the first 
time.

Let's say you are writing function xyz().  The function has a set of 
requirements, let's call the set R, that must be met in order to perform 
its actions.  R might be that the input matrix is square and full rank, 
or that the file exist, etc.

There are three possible actions a function can take when a requirement 
is not met, 

      A1.  Abort with error.

      A2.  Return a missing result with the appropriate number of rows and 
           columns.

      A3.  Return a special value that indicates problems.

The purpose of this posting is to outline when the function should do which.

Divide the requirements R into two subsets, R_1 and R_2.  R_1 is the subset
of requirements of that it is easy for the user to verify.  R_2 are the 
remaining, the subset difficult to establish before calling.

The following are the guidelines we try to follow:

     G1.  For all elements in R_1, action A1 is appropriate.

     G2.  For all elements in R_2, actions A2 or A3 are appropriate.

     G3.  For an element in R_2, action A1 is allowed, but then 
          there should be a corresponding _xyz() function that takes action 
          A2 or A3.

     G4.  For numerical functions, action A3 is to be avoided whenever 
          possible.  Action A2 is preferred.

     G5.  Action A3 is appropriate only for nonnumerical functions, but ...



G1.  For all elements in R_1, action A_1 is appropriate
-------------------------------------------------------

xyz() might require that input matrix A be square.  If the user wants 
to be robust to nonsquare matrices, it is easy enough to code 

         if (rows(A)==cols(A) result = xyz(A)
         else {
                 // do something else
         }

G2.  For all elements in R_2, actions A2 or A3 are appropriate
--------------------------------------------------------------

xyz() might require input matrix A be positive definite.  It is difficult for
the caller to know whether A really is positive definite, and therefore the
xyz() must take some action other than A1 in the non positive definite case.

xyz() might require that input matrix A be full rank.  It is easy enough 
for the user to check that, 

          if (rank(A)!= rows(A)) ...

but look carefully at the documentation of rank().  Function rank() makes 
considerable calculation in order to obtain its result.  Thus, full rank 
is considered R_2, not R_1.  

In most cases, that A is not full rank will be easily discovered in the code
of xyz() because there will be a division by zero, an unexpected negative
intermediate calculation, and the like.  If, however, xyz() would never
discover that A is not full rank in the natural order of things, it becomes 
even more important that xyz() check that A be of full rank lest xyz() 
return misleading results.

There would be an exception to the above:  xyz() will be used repeatedly
and it is desirable that xyz() be fast.  Moreover, xyz() is typically
used along with a suite of other functions, all of which also require 
full rankedness.  Hence, xyz() does not want to waste time checking 
something that is likely to be true.  In such cases, if xyz() would 
return a misleading result with a non full-rank matrix, xyz() should 
be renamed _xyz(), and the documentation should emphasize that it is 
the caller's responsibility to check that the matrix is full rank.

There are lots of other examples having nothing to do with matrices 
that fit into the above model, such as whether a file exists, a 
variable exists, etc.


G3.  For an elements in R_2, action A1 is allowed, but ...
---------------------------------------------------------

It is often the case, especially in numerical subroutines, that the 
caller desires action A1.  In 99.9% of cases, the requirement (say 
positive definiteness) will be met, and in the .1% of cases where it 
isn't, the user never intended to write code to handle the case, anyway.
Crashing out is a fine solution.

In that case, there needs to be a companion function _xyz() that does 
not take action A1.  Programmers implementing complicated systems 
need to be able to capture unlikely situations.

Think of guideline G3 as the escape clause for G2.  G3 allows you to 
ignore G2 and make an easy-to-use function xyz() for most callers.


G4.  For numerical functions, action A3 is to be avoided whenever 
     possible.  Action A2 is preferred.
-----------------------------------------------------------------

In the case of numerical functions, A2 is the preferred action.
The returned result should contain missing values, it should be of 
the appropriate numerical type, and it should be of the appropriate 
dimension.

For instance, function xyz(A) might return A^(-1).  It might require 
that A be square and positive definite.  Action A1 would be appropriate 
for handling the square restriction.  To handle the second restriction, 
appropriate action would be to return an n x n matrix of missing values.

The reason for this is that the caller can then ignore such issues if he or
she wishes.  Subsequent calculations will work because matrices will 
be conformable, but the missing values will propagate, just as they 
should.

        
G5.  Action A3 is appropriate only for nonnumerical functions, but ...
----------------------------------------------------------------------

Try to avoid returning special values, especially when they are mixed 
in with valid values.

Sometimes it is unavoidable.  In such cases, the function name should start
with an underscore.  Function _fopen() returns a positive or negative result.
A positive result is a file handle.  A negative result is a problem code.

Missing value is never considered a "special value", and this guideline does 
not apply to missing value.  Returning a missing value when requirements 
are not met is desirable.

Concerning special values, when only special values are returned, 
convention is that 0 indicate success.  1 might indicate failure, or 
different positive or negative values might be used to indicate the 
type of failure.  THIS IS DIFFERENT FROM THE CONVENTIONS USED IN 
MANY OTHER PROGRAMMING LANGUAGES, where 0 is often used to indicate 
failure.


-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index