Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: suggestion for missing()


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: suggestion for missing()
Date   Mon, 15 Sep 2008 18:13:56 +0100

In case it gets lost, I'll stick in here a reminder that -dropmiss-
exists to do what Jeph does in his examples. -search- for locations. 

On the main point: 

I've wanted something like this more than once, so I sympathise. 

Whether this is really a good idea I don't know. It may look cosmetic,
but it is rather a fundamental change to Stata's syntax, and it would
introduce a diversity of allowable syntaxes when consistency is arguably
a very good thing. 

If this were done, then it should be done consistently across similar
functions such as -max()- and -min()- as well. 

Jeph, however, I think introduces some red herrings here. Choice of
terminology confuses the several intersecting issues. Some of the fault
is Stata's in that when -egen- was introduced the members of its family
were called -egen- functions. I don't have a better name to suggest, but
I think this similarity has been widely (although not deeply) confusing.


First off, note that despite similar names functions and -egen-
functions are really quite different beasts. Stata's functions are not
that different from functions in many other languages, but -egen-
functions are very idiosyncratic. The name really is exact: -egen-
functions work __only__ with -egen-. 

Jeph mentions -rowmiss()- and -rowtotal()- and calls them row operators.
They are, strictly, -egen- functions. The fact that they are defined to
work across rows, meaning strictly observations, is just that, a fact.
-egen- functions could have any syntax for their argument that you
wanted. Some syntaxes would seem perverse but anything programmable is
possible so long as it passes -egen-. 

Jeph then goes on to talk about column operators, but here his informal
use of terminology becomes, potentially, rather misleading. 

Operators in most languages, although certainly not all, seem to be
distinguished from functions largely by whether they are implemented via
special symbols (e.g. + - * | &) or via names. That is an accident of
implementation which we could ponder, but, keeping to the point, let me
just underline that when Jeph says column operators I think he means
Stata functions, strict sense. 

Such functions are not designed to work with columns, meaning strictly
variables, or indeed anything in particular. They are designed to work
with anything that satisfies their syntax. Whether I say
-missing(1,2,3,4)- or -missing(a[1], a[2], a[3], a[4])- or -missing(x,
y, z)- is all one to -missing()- so long as the arguments fit the
syntax. 
The results in context will differ because the rest of Stata is so
smart, but I think -missing()- is just a mindless machine. 

This is mostly just yet another plea to use Stata's terminology when
discussing Stata! 

Nick
n.j.cox@durham.ac.uk 

Jeph Herrin

This is mostly a suggestion to StataCorp, perhaps it
has been made or explained elsewhere.

The function -missing()- is quite useful, but I'd
like to propose that it be modified to take a -varlist-
as argument.

First, it would be even more useful if one could specify
many variable names using short hand. Eg, why not

   drop if missing(q1-q23)

or

   drop if missing(_all)

?

Second, this would be consistent with other row operators
such as -rowmiss- & -rowtotal-, which take varlists. At
least, it seems like that is the Stata convention - row
operators take varlists, column operators take comma
separated lists. Perhaps I'm wrong on this, but it seems
enough of a convention that I invariable try to stick a
varlist in -missing()- anyway.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index