Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Mata question


From   "Ben Jann" <[email protected]>
To   <[email protected]>
Subject   RE: st: Mata question
Date   Thu, 20 Apr 2006 18:38:03 +0200

Thanks Bill. Very helpful.
ben

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of William Gould, Stata
> Sent: Thursday, April 20, 2006 5:10 PM
> To: [email protected]
> Subject: Re: st: Mata question
> 
> I just answered the question
> 
> > Is there a Mata equivalent to Stata's -capture- statement?
> 
> from Daniel Hoechle <[email protected]>, and I now see that
> Benn Jann <[email protected]> chimed in,
> 
> > I would be interested in that, too.
> 
> In case you haven't noticed, Ben is pretty proficient with Mata and is
> busy writing functions for all of us to use.  So I would like to write
a
> second answer aimed at function writers on how functions should
behave.
> This will be of interest too to function consumers, because it will
> reveal what one can expect when approaching a new function for the
first
> time.
> 
> Let's say you are writing function xyz().  The function has a set of
> requirements, let's call the set R, that must be met in order to
perform
> its actions.  R might be that the input matrix is square and full
rank,
> or that the file exist, etc.
> 
> There are three possible actions a function can take when a
requirement
> is not met,
> 
>       A1.  Abort with error.
> 
>       A2.  Return a missing result with the appropriate number of rows
and
>            columns.
> 
>       A3.  Return a special value that indicates problems.
> 
> The purpose of this posting is to outline when the function should do
> which.
> 
> Divide the requirements R into two subsets, R_1 and R_2.  R_1 is the
> subset
> of requirements of that it is easy for the user to verify.  R_2 are
the
> remaining, the subset difficult to establish before calling.
> 
> The following are the guidelines we try to follow:
> 
>      G1.  For all elements in R_1, action A1 is appropriate.
> 
>      G2.  For all elements in R_2, actions A2 or A3 are appropriate.
> 
>      G3.  For an element in R_2, action A1 is allowed, but then
>           there should be a corresponding _xyz() function that takes
> action
>           A2 or A3.
> 
>      G4.  For numerical functions, action A3 is to be avoided whenever
>           possible.  Action A2 is preferred.
> 
>      G5.  Action A3 is appropriate only for nonnumerical functions,
but
> ...
> 
> 
> 
> G1.  For all elements in R_1, action A_1 is appropriate
> -------------------------------------------------------
> 
> xyz() might require that input matrix A be square.  If the user wants
> to be robust to nonsquare matrices, it is easy enough to code
> 
>          if (rows(A)==cols(A) result = xyz(A)
>          else {
>                  // do something else
>          }
> 
> G2.  For all elements in R_2, actions A2 or A3 are appropriate
> --------------------------------------------------------------
> 
> xyz() might require input matrix A be positive definite.  It is
difficult
> for
> the caller to know whether A really is positive definite, and
therefore
> the
> xyz() must take some action other than A1 in the non positive definite
> case.
> 
> xyz() might require that input matrix A be full rank.  It is easy
enough
> for the user to check that,
> 
>           if (rank(A)!= rows(A)) ...
> 
> but look carefully at the documentation of rank().  Function rank()
makes
> considerable calculation in order to obtain its result.  Thus, full
rank
> is considered R_2, not R_1.
> 
> In most cases, that A is not full rank will be easily discovered in
the
> code
> of xyz() because there will be a division by zero, an unexpected
negative
> intermediate calculation, and the like.  If, however, xyz() would
never
> discover that A is not full rank in the natural order of things, it
> becomes
> even more important that xyz() check that A be of full rank lest xyz()
> return misleading results.
> 
> There would be an exception to the above:  xyz() will be used
repeatedly
> and it is desirable that xyz() be fast.  Moreover, xyz() is typically
> used along with a suite of other functions, all of which also require
> full rankedness.  Hence, xyz() does not want to waste time checking
> something that is likely to be true.  In such cases, if xyz() would
> return a misleading result with a non full-rank matrix, xyz() should
> be renamed _xyz(), and the documentation should emphasize that it is
> the caller's responsibility to check that the matrix is full rank.
> 
> There are lots of other examples having nothing to do with matrices
> that fit into the above model, such as whether a file exists, a
> variable exists, etc.
> 
> 
> G3.  For an elements in R_2, action A1 is allowed, but ...
> ---------------------------------------------------------
> 
> It is often the case, especially in numerical subroutines, that the
> caller desires action A1.  In 99.9% of cases, the requirement (say
> positive definiteness) will be met, and in the .1% of cases where it
> isn't, the user never intended to write code to handle the case,
anyway.
> Crashing out is a fine solution.
> 
> In that case, there needs to be a companion function _xyz() that does
> not take action A1.  Programmers implementing complicated systems
> need to be able to capture unlikely situations.
> 
> Think of guideline G3 as the escape clause for G2.  G3 allows you to
> ignore G2 and make an easy-to-use function xyz() for most callers.
> 
> 
> G4.  For numerical functions, action A3 is to be avoided whenever
>      possible.  Action A2 is preferred.
> -----------------------------------------------------------------
> 
> In the case of numerical functions, A2 is the preferred action.
> The returned result should contain missing values, it should be of
> the appropriate numerical type, and it should be of the appropriate
> dimension.
> 
> For instance, function xyz(A) might return A^(-1).  It might require
> that A be square and positive definite.  Action A1 would be
appropriate
> for handling the square restriction.  To handle the second
restriction,
> appropriate action would be to return an n x n matrix of missing
values.
> 
> The reason for this is that the caller can then ignore such issues if
he
> or
> she wishes.  Subsequent calculations will work because matrices will
> be conformable, but the missing values will propagate, just as they
> should.
> 
> 
> G5.  Action A3 is appropriate only for nonnumerical functions, but ...
> ----------------------------------------------------------------------
> 
> Try to avoid returning special values, especially when they are mixed
> in with valid values.
> 
> Sometimes it is unavoidable.  In such cases, the function name should
> start
> with an underscore.  Function _fopen() returns a positive or negative
> result.
> A positive result is a file handle.  A negative result is a problem
code.
> 
> Missing value is never considered a "special value", and this
guideline
> does
> not apply to missing value.  Returning a missing value when
requirements
> are not met is desirable.
> 
> Concerning special values, when only special values are returned,
> convention is that 0 indicate success.  1 might indicate failure, or
> different positive or negative values might be used to indicate the
> type of failure.  THIS IS DIFFERENT FROM THE CONVENTIONS USED IN
> MANY OTHER PROGRAMMING LANGUAGES, where 0 is often used to indicate
> failure.
> 
> 
> -- Bill
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index