# Re: st: Missing values in Mata functions

 From wgould@stata.com (William Gould, Stata) To statalist@hsphsun2.harvard.edu Subject Re: st: Missing values in Mata functions Date Fri, 09 Sep 2005 11:21:24 -0500

```I want to add something to my posting on the treatment of missing values
in Mata functions.  You may remember that I divided the numeric functions
into three categories,

(M)  Mathematical functions

(S)  Statistical functions

(U)  Utility functions

and then I said that (M) functions should handle missing values, it
does not much matter what (S) does, and (U) functions do not allow
missing values, or give a special meaning to them.

I said that it does not much matter what a category (S) function does because
good programming style is to make sure what is passed to them does not contain
missing values, and that is easy to do.  It is good programming style because,
as the data are divided into separate, easy-to-use matrices, one subroutine
might exclude one set of observations and another subroutine, another set.

Remember, what distinguishes a category (S) function is that it works on raw
data, and such data is invariably obtained from from st_data() or st_view(),
where it is easy to exclude the missing values at the outset.

Thus, I argued, although I did not explicitly say this, writing additional
code in a category (S) program is probably a waste of time because

1.  It is probably better that a category (S) function does not
allow missing values, because otherwise, the user of the
function may be lead into sloppy and dangerous habits.

2.  Ben Jann <ben.jann@soz.gess.ethz.ch>, who asked the original
question, said he was doing this by coding

if (missing(x)) _error(3351)

Good idea, except -missing(x)- can be expensive to calculate.
The missing() function has to make a pass through the data,
looking for missing values, to establish that there are not
any.

Hence, even though it is probably better that a category (S) function
does not allow missing values, there is a cost to imposing that.

So here is what I add now:

In a category (S) function that does not accept missing values,
it is acceptable to omit

if (missing(x)) _error(3351)

as long as the function does something ugly in the presence of
missing values.  The ugly action could be abort with error, or
it could be a result with some or all missing values.  As long
as something ugly happens, the user of the function cannot be

On the other hand, if the function that would return something that
could be be misinterpreted as a valid result, one should probably
include

if (missing(x)) _error(3351)

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```