Thanks Bill. Very helpful.
ben
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> statalist@hsphsun2.harvard.edu] On Behalf Of William Gould, Stata
> Sent: Thursday, April 20, 2006 5:10 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Mata question
>
> I just answered the question
>
> > Is there a Mata equivalent to Stata's -capture- statement?
>
> from Daniel Hoechle <daniel.hoechle@gmail.com>, and I now see that
> Benn Jann <ben.jann@soz.gess.ethz.ch> chimed in,
>
> > I would be interested in that, too.
>
> In case you haven't noticed, Ben is pretty proficient with Mata and is
> busy writing functions for all of us to use. So I would like to write
a
> second answer aimed at function writers on how functions should
behave.
> This will be of interest too to function consumers, because it will
> reveal what one can expect when approaching a new function for the
first
> time.
>
> Let's say you are writing function xyz(). The function has a set of
> requirements, let's call the set R, that must be met in order to
perform
> its actions. R might be that the input matrix is square and full
rank,
> or that the file exist, etc.
>
> There are three possible actions a function can take when a
requirement
> is not met,
>
> A1. Abort with error.
>
> A2. Return a missing result with the appropriate number of rows
and
> columns.
>
> A3. Return a special value that indicates problems.
>
> The purpose of this posting is to outline when the function should do
> which.
>
> Divide the requirements R into two subsets, R_1 and R_2. R_1 is the
> subset
> of requirements of that it is easy for the user to verify. R_2 are
the
> remaining, the subset difficult to establish before calling.
>
> The following are the guidelines we try to follow:
>
> G1. For all elements in R_1, action A1 is appropriate.
>
> G2. For all elements in R_2, actions A2 or A3 are appropriate.
>
> G3. For an element in R_2, action A1 is allowed, but then
> there should be a corresponding _xyz() function that takes
> action
> A2 or A3.
>
> G4. For numerical functions, action A3 is to be avoided whenever
> possible. Action A2 is preferred.
>
> G5. Action A3 is appropriate only for nonnumerical functions,
but
> ...
>
>
>
> G1. For all elements in R_1, action A_1 is appropriate
> -------------------------------------------------------
>
> xyz() might require that input matrix A be square. If the user wants
> to be robust to nonsquare matrices, it is easy enough to code
>
> if (rows(A)==cols(A) result = xyz(A)
> else {
> // do something else
> }
>
> G2. For all elements in R_2, actions A2 or A3 are appropriate
> --------------------------------------------------------------
>
> xyz() might require input matrix A be positive definite. It is
difficult
> for
> the caller to know whether A really is positive definite, and
therefore
> the
> xyz() must take some action other than A1 in the non positive definite
> case.
>
> xyz() might require that input matrix A be full rank. It is easy
enough
> for the user to check that,
>
> if (rank(A)!= rows(A)) ...
>
> but look carefully at the documentation of rank(). Function rank()
makes
> considerable calculation in order to obtain its result. Thus, full
rank
> is considered R_2, not R_1.
>
> In most cases, that A is not full rank will be easily discovered in
the
> code
> of xyz() because there will be a division by zero, an unexpected
negative
> intermediate calculation, and the like. If, however, xyz() would
never
> discover that A is not full rank in the natural order of things, it
> becomes
> even more important that xyz() check that A be of full rank lest xyz()
> return misleading results.
>
> There would be an exception to the above: xyz() will be used
repeatedly
> and it is desirable that xyz() be fast. Moreover, xyz() is typically
> used along with a suite of other functions, all of which also require
> full rankedness. Hence, xyz() does not want to waste time checking
> something that is likely to be true. In such cases, if xyz() would
> return a misleading result with a non full-rank matrix, xyz() should
> be renamed _xyz(), and the documentation should emphasize that it is
> the caller's responsibility to check that the matrix is full rank.
>
> There are lots of other examples having nothing to do with matrices
> that fit into the above model, such as whether a file exists, a
> variable exists, etc.
>
>
> G3. For an elements in R_2, action A1 is allowed, but ...
> ---------------------------------------------------------
>
> It is often the case, especially in numerical subroutines, that the
> caller desires action A1. In 99.9% of cases, the requirement (say
> positive definiteness) will be met, and in the .1% of cases where it
> isn't, the user never intended to write code to handle the case,
anyway.
> Crashing out is a fine solution.
>
> In that case, there needs to be a companion function _xyz() that does
> not take action A1. Programmers implementing complicated systems
> need to be able to capture unlikely situations.
>
> Think of guideline G3 as the escape clause for G2. G3 allows you to
> ignore G2 and make an easy-to-use function xyz() for most callers.
>
>
> G4. For numerical functions, action A3 is to be avoided whenever
> possible. Action A2 is preferred.
> -----------------------------------------------------------------
>
> In the case of numerical functions, A2 is the preferred action.
> The returned result should contain missing values, it should be of
> the appropriate numerical type, and it should be of the appropriate
> dimension.
>
> For instance, function xyz(A) might return A^(-1). It might require
> that A be square and positive definite. Action A1 would be
appropriate
> for handling the square restriction. To handle the second
restriction,
> appropriate action would be to return an n x n matrix of missing
values.
>
> The reason for this is that the caller can then ignore such issues if
he
> or
> she wishes. Subsequent calculations will work because matrices will
> be conformable, but the missing values will propagate, just as they
> should.
>
>
> G5. Action A3 is appropriate only for nonnumerical functions, but ...
> ----------------------------------------------------------------------
>
> Try to avoid returning special values, especially when they are mixed
> in with valid values.
>
> Sometimes it is unavoidable. In such cases, the function name should
> start
> with an underscore. Function _fopen() returns a positive or negative
> result.
> A positive result is a file handle. A negative result is a problem
code.
>
> Missing value is never considered a "special value", and this
guideline
> does
> not apply to missing value. Returning a missing value when
requirements
> are not met is desirable.
>
> Concerning special values, when only special values are returned,
> convention is that 0 indicate success. 1 might indicate failure, or
> different positive or negative values might be used to indicate the
> type of failure. THIS IS DIFFERENT FROM THE CONVENTIONS USED IN
> MANY OTHER PROGRAMMING LANGUAGES, where 0 is often used to indicate
> failure.
>
>
> -- Bill
> wgould@stata.com
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/