[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: statalist-digest V4 #2935 - strange world

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: RE: statalist-digest V4 #2935 - strange world
Date   Tue, 8 Jan 2008 16:39:00 -0000

I am with StataCorp on this, not that they need my 

More importantly, exasperation on this score is not 
a substitute for an explanation of what you think 
Stata should be doing instead and a demonstration 
of how that in turn would be unproblematic. 

I do agree that long-standing users as well as new users can from time 
to time forget the implications of what they write and 
get bitten by this: me too, but that's not the main issue. 

The main issue is the insinuation that Stata is being 
illogical here. I can't see any basis for that in Allan's 
or Tom's posts, however colourfully or strongly they write. 
On the contrary, the objections to Stata's practice arise 
because _logic_ is biting when users in fact _mean_ something else. 
There is tension between syntax and semantics. But, simply yet
crucially, the problem is that what you _mean_ is in your head, 
and Stata can only work on the basis of what you say.  

The first step in Stata's position is that missing numeric 
values must themselves be assigned a numeric value. This 
follows from the fact than when -sort-ed on a numeric 
value, observations with missings must go somewhere. Anywhere "in 
the middle" of a numeric range would be absurd, so there are two
missings could be treated as arbitrarily high, or as arbitrarily low. 
Stata chose the first, as we know. If they had chosen 
the second, that would have been equally justifiable: we 
would just be fielding questions arising from the use of <, not >. 

The second step is then over what is to be done with missings given 
an inequality > or <. (The same issues arise with >= or <=, 
but I won't spell that out below, as making the inequality weak 
not strong is, I guess, an issue for no-one.) 

It follows from Stata's first step that missings are 
> any non-missing value, and so, for example, included in 
inequalities like  

if x > 42 

but not included in inequalities like 

if x < 42 

Now that's an awkward asymmetry, but logic doesn't feature in charm
The consequences of a set of rules can be surprising or even unpleasant,

but there we go. 

Now in practice it's often (but certainly not always) true that when
users write 

if x > 42 

they don't have in mind the missings. If so, then they just need to say 

But, seriously, what are the alternatives? Allan and Tom 
seem to want "if x > 42" to ignore missings on -x-. If that 
were so, then it would solve one problem only to replace 
it with at least four others, on quite different levels: 

1. Stata is now inconsistent. Missings are assigned precise 
numeric values for some purposes (e.g. -sort-ing) but not others. 

2. What, under that proposal, would be the truth value of 
an expression, say 

. > 42 

We need to know that for all sorts of reasons, quite apart from the
selection of observations (which seems to be by far the most common
source of complaints under this heading). If that expression is to be
considered either true or false, then either decision implies
inconsistency with other parts of Stata. See also 1. (Another
alternative is some three-way logic. 
I won't discuss that here, but StataCorp have, in my view rightly,
considered that closely and decided it's not the solution.) 

3. Designing a language according to what users are supposed to mean, 
rather than what they say, is, in my experience, a very long, very
slope to perdition. If you are in charge of a program, you can design 
it exactly the way you want. If you are offering the program to others, 
the only way that will work well is if there is a mutually understood
(This mailer and a word processor I am obliged to use sometimes try to
guess what I mean, and both are confounded nuisances.) 

4. Declaring this behaviour now to be a bug, or at least a misfeature,
be a major change in Stata. Goodness knows how many scripts, programs
and understandings would be broken by such a change, even under version
(Allan does flag this, but it's worth underscoring.) 

I could go on, but this is long enough. There is no disputing the
this aspect of Stata's language can cause. But I am at a complete loss
to know how it could be fixed without affecting -if-, or the
interpretation of >, or the treatment of missings in ways that would be
immensely more awkward than the problem complained of here. 

[email protected] 

Steichen, Thomas J.

I'm with Allan on this. Implementing "if x>y" to evaluate
as True when x is missing is a logical flaw and should be
corrected. After 10+ years with Stata, I still occasionally
fall into this trap. I figure that if I can deal with -index()-
being changed to... now what was it?... I can deal with a
real flaw being fixed!

Allan Reese

Others have pointed out that "if x>y" in Stata evaluates as True when x
is missing "."

I've raised this before and had to accept as a feature of Stata that "."
is a big number and "computers do what you tell them, not what you
want."  Nevertheless, I remain of the opinion that it is
counter-intuitive, logically incorrect, and undoubtedly leads to
computer-assisted errors.  Changing the operation of Stata now would
inconvenience most current users, but it would not be inconsistent if
the kernel were adapted to output a warning after such calculations
"Missing values included - check your results".

It's indeed a strange world where the priests of IT can claim "user
error" when you fall into a trap they set.  Software will at some time
come under the remit of health and safety legislation - IT's doin' me
'ed in!

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index