Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Missing values in Mata functions


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Missing values in Mata functions
Date   Fri, 09 Sep 2005 11:21:24 -0500

I want to add something to my posting on the treatment of missing values
in Mata functions.  You may remember that I divided the numeric functions
into three categories, 

      (M)  Mathematical functions 

      (S)  Statistical functions

      (U)  Utility functions

and then I said that (M) functions should handle missing values, it 
does not much matter what (S) does, and (U) functions do not allow 
missing values, or give a special meaning to them.

I said that it does not much matter what a category (S) function does because
good programming style is to make sure what is passed to them does not contain
missing values, and that is easy to do.  It is good programming style because,
as the data are divided into separate, easy-to-use matrices, one subroutine
might exclude one set of observations and another subroutine, another set.

Remember, what distinguishes a category (S) function is that it works on raw
data, and such data is invariably obtained from from st_data() or st_view(),
where it is easy to exclude the missing values at the outset.

Thus, I argued, although I did not explicitly say this, writing additional 
code in a category (S) program is probably a waste of time because 

    1.  It is probably better that a category (S) function does not 
        allow missing values, because otherwise, the user of the 
        function may be lead into sloppy and dangerous habits.

    2.  Ben Jann <ben.jann@soz.gess.ethz.ch>, who asked the original 
        question, said he was doing this by coding 

             if (missing(x)) _error(3351)

        Good idea, except -missing(x)- can be expensive to calculate.
        The missing() function has to make a pass through the data, 
        looking for missing values, to establish that there are not 
        any.

Hence, even though it is probably better that a category (S) function 
does not allow missing values, there is a cost to imposing that.

So here is what I add now:

        In a category (S) function that does not accept missing values, 
        it is acceptable to omit 

             if (missing(x)) _error(3351)

        as long as the function does something ugly in the presence of 
        missing values.  The ugly action could be abort with error, or 
        it could be a result with some or all missing values.  As long 
        as something ugly happens, the user of the function cannot be 
        mislead.

On the other hand, if the function that would return something that 
could be be misinterpreted as a valid result, one should probably 
include 

             if (missing(x)) _error(3351)


-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index