Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Mata question

From   "Ben Jann" <[email protected]>
To   <[email protected]>
Subject   RE: st: Mata question
Date   Thu, 20 Apr 2006 18:38:03 +0200

Thanks Bill. Very helpful.

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of William Gould, Stata
> Sent: Thursday, April 20, 2006 5:10 PM
> To: [email protected]
> Subject: Re: st: Mata question
> I just answered the question
> > Is there a Mata equivalent to Stata's -capture- statement?
> from Daniel Hoechle <[email protected]>, and I now see that
> Benn Jann <[email protected]> chimed in,
> > I would be interested in that, too.
> In case you haven't noticed, Ben is pretty proficient with Mata and is
> busy writing functions for all of us to use.  So I would like to write
> second answer aimed at function writers on how functions should
> This will be of interest too to function consumers, because it will
> reveal what one can expect when approaching a new function for the
> time.
> Let's say you are writing function xyz().  The function has a set of
> requirements, let's call the set R, that must be met in order to
> its actions.  R might be that the input matrix is square and full
> or that the file exist, etc.
> There are three possible actions a function can take when a
> is not met,
>       A1.  Abort with error.
>       A2.  Return a missing result with the appropriate number of rows
>            columns.
>       A3.  Return a special value that indicates problems.
> The purpose of this posting is to outline when the function should do
> which.
> Divide the requirements R into two subsets, R_1 and R_2.  R_1 is the
> subset
> of requirements of that it is easy for the user to verify.  R_2 are
> remaining, the subset difficult to establish before calling.
> The following are the guidelines we try to follow:
>      G1.  For all elements in R_1, action A1 is appropriate.
>      G2.  For all elements in R_2, actions A2 or A3 are appropriate.
>      G3.  For an element in R_2, action A1 is allowed, but then
>           there should be a corresponding _xyz() function that takes
> action
>           A2 or A3.
>      G4.  For numerical functions, action A3 is to be avoided whenever
>           possible.  Action A2 is preferred.
>      G5.  Action A3 is appropriate only for nonnumerical functions,
> ...
> G1.  For all elements in R_1, action A_1 is appropriate
> -------------------------------------------------------
> xyz() might require that input matrix A be square.  If the user wants
> to be robust to nonsquare matrices, it is easy enough to code
>          if (rows(A)==cols(A) result = xyz(A)
>          else {
>                  // do something else
>          }
> G2.  For all elements in R_2, actions A2 or A3 are appropriate
> --------------------------------------------------------------
> xyz() might require input matrix A be positive definite.  It is
> for
> the caller to know whether A really is positive definite, and
> the
> xyz() must take some action other than A1 in the non positive definite
> case.
> xyz() might require that input matrix A be full rank.  It is easy
> for the user to check that,
>           if (rank(A)!= rows(A)) ...
> but look carefully at the documentation of rank().  Function rank()
> considerable calculation in order to obtain its result.  Thus, full
> is considered R_2, not R_1.
> In most cases, that A is not full rank will be easily discovered in
> code
> of xyz() because there will be a division by zero, an unexpected
> intermediate calculation, and the like.  If, however, xyz() would
> discover that A is not full rank in the natural order of things, it
> becomes
> even more important that xyz() check that A be of full rank lest xyz()
> return misleading results.
> There would be an exception to the above:  xyz() will be used
> and it is desirable that xyz() be fast.  Moreover, xyz() is typically
> used along with a suite of other functions, all of which also require
> full rankedness.  Hence, xyz() does not want to waste time checking
> something that is likely to be true.  In such cases, if xyz() would
> return a misleading result with a non full-rank matrix, xyz() should
> be renamed _xyz(), and the documentation should emphasize that it is
> the caller's responsibility to check that the matrix is full rank.
> There are lots of other examples having nothing to do with matrices
> that fit into the above model, such as whether a file exists, a
> variable exists, etc.
> G3.  For an elements in R_2, action A1 is allowed, but ...
> ---------------------------------------------------------
> It is often the case, especially in numerical subroutines, that the
> caller desires action A1.  In 99.9% of cases, the requirement (say
> positive definiteness) will be met, and in the .1% of cases where it
> isn't, the user never intended to write code to handle the case,
> Crashing out is a fine solution.
> In that case, there needs to be a companion function _xyz() that does
> not take action A1.  Programmers implementing complicated systems
> need to be able to capture unlikely situations.
> Think of guideline G3 as the escape clause for G2.  G3 allows you to
> ignore G2 and make an easy-to-use function xyz() for most callers.
> G4.  For numerical functions, action A3 is to be avoided whenever
>      possible.  Action A2 is preferred.
> -----------------------------------------------------------------
> In the case of numerical functions, A2 is the preferred action.
> The returned result should contain missing values, it should be of
> the appropriate numerical type, and it should be of the appropriate
> dimension.
> For instance, function xyz(A) might return A^(-1).  It might require
> that A be square and positive definite.  Action A1 would be
> for handling the square restriction.  To handle the second
> appropriate action would be to return an n x n matrix of missing
> The reason for this is that the caller can then ignore such issues if
> or
> she wishes.  Subsequent calculations will work because matrices will
> be conformable, but the missing values will propagate, just as they
> should.
> G5.  Action A3 is appropriate only for nonnumerical functions, but ...
> ----------------------------------------------------------------------
> Try to avoid returning special values, especially when they are mixed
> in with valid values.
> Sometimes it is unavoidable.  In such cases, the function name should
> start
> with an underscore.  Function _fopen() returns a positive or negative
> result.
> A positive result is a file handle.  A negative result is a problem
> Missing value is never considered a "special value", and this
> does
> not apply to missing value.  Returning a missing value when
> are not met is desirable.
> Concerning special values, when only special values are returned,
> convention is that 0 indicate success.  1 might indicate failure, or
> different positive or negative values might be used to indicate the
> MANY OTHER PROGRAMMING LANGUAGES, where 0 is often used to indicate
> failure.
> -- Bill
> [email protected]
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index