[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <[email protected]> |

To |
<[email protected]> |

Subject |
st: RE: RE: RE: statalist-digest V4 #2935 - strange world |

Date |
Tue, 8 Jan 2008 16:39:00 -0000 |

I am with StataCorp on this, not that they need my support. More importantly, exasperation on this score is not a substitute for an explanation of what you think Stata should be doing instead and a demonstration of how that in turn would be unproblematic. I do agree that long-standing users as well as new users can from time to time forget the implications of what they write and get bitten by this: me too, but that's not the main issue. The main issue is the insinuation that Stata is being illogical here. I can't see any basis for that in Allan's or Tom's posts, however colourfully or strongly they write. On the contrary, the objections to Stata's practice arise because _logic_ is biting when users in fact _mean_ something else. There is tension between syntax and semantics. But, simply yet crucially, the problem is that what you _mean_ is in your head, and Stata can only work on the basis of what you say. The first step in Stata's position is that missing numeric values must themselves be assigned a numeric value. This follows from the fact than when -sort-ed on a numeric value, observations with missings must go somewhere. Anywhere "in the middle" of a numeric range would be absurd, so there are two solutions: missings could be treated as arbitrarily high, or as arbitrarily low. Stata chose the first, as we know. If they had chosen the second, that would have been equally justifiable: we would just be fielding questions arising from the use of <, not >. The second step is then over what is to be done with missings given an inequality > or <. (The same issues arise with >= or <=, but I won't spell that out below, as making the inequality weak not strong is, I guess, an issue for no-one.) It follows from Stata's first step that missings are > any non-missing value, and so, for example, included in inequalities like if x > 42 but not included in inequalities like if x < 42 Now that's an awkward asymmetry, but logic doesn't feature in charm school. The consequences of a set of rules can be surprising or even unpleasant, but there we go. Now in practice it's often (but certainly not always) true that when users write if x > 42 they don't have in mind the missings. If so, then they just need to say so. But, seriously, what are the alternatives? Allan and Tom seem to want "if x > 42" to ignore missings on -x-. If that were so, then it would solve one problem only to replace it with at least four others, on quite different levels: 1. Stata is now inconsistent. Missings are assigned precise numeric values for some purposes (e.g. -sort-ing) but not others. 2. What, under that proposal, would be the truth value of an expression, say . > 42 We need to know that for all sorts of reasons, quite apart from the selection of observations (which seems to be by far the most common source of complaints under this heading). If that expression is to be considered either true or false, then either decision implies inconsistency with other parts of Stata. See also 1. (Another alternative is some three-way logic. I won't discuss that here, but StataCorp have, in my view rightly, considered that closely and decided it's not the solution.) 3. Designing a language according to what users are supposed to mean, rather than what they say, is, in my experience, a very long, very slippery slope to perdition. If you are in charge of a program, you can design it exactly the way you want. If you are offering the program to others, the only way that will work well is if there is a mutually understood logic. (This mailer and a word processor I am obliged to use sometimes try to guess what I mean, and both are confounded nuisances.) 4. Declaring this behaviour now to be a bug, or at least a misfeature, would be a major change in Stata. Goodness knows how many scripts, programs and understandings would be broken by such a change, even under version control. (Allan does flag this, but it's worth underscoring.) I could go on, but this is long enough. There is no disputing the irritation this aspect of Stata's language can cause. But I am at a complete loss to know how it could be fixed without affecting -if-, or the interpretation of >, or the treatment of missings in ways that would be immensely more awkward than the problem complained of here. Nick [email protected] Steichen, Thomas J. =================== I'm with Allan on this. Implementing "if x>y" to evaluate as True when x is missing is a logical flaw and should be corrected. After 10+ years with Stata, I still occasionally fall into this trap. I figure that if I can deal with -index()- being changed to... now what was it?... I can deal with a real flaw being fixed! Allan Reese =========== Others have pointed out that "if x>y" in Stata evaluates as True when x is missing "." I've raised this before and had to accept as a feature of Stata that "." is a big number and "computers do what you tell them, not what you want." Nevertheless, I remain of the opinion that it is counter-intuitive, logically incorrect, and undoubtedly leads to computer-assisted errors. Changing the operation of Stata now would inconvenience most current users, but it would not be inconsistent if the kernel were adapted to output a warning after such calculations "Missing values included - check your results". It's indeed a strange world where the priests of IT can claim "user error" when you fall into a trap they set. Software will at some time come under the remit of health and safety legislation - IT's doin' me 'ed in! * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: RE: RE: RE: statalist-digest V4 #2935 - strange world***From:*"Steichen, Thomas J." <[email protected]>

**Re: st: RE: RE: RE: statalist-digest V4 #2935 - strange world***From:*"Austin Nichols" <[email protected]>

**References**:**st: RE: statalist-digest V4 #2935 - strange world***From:*"Allan Reese (Cefas)" <[email protected]>

**st: RE: RE: statalist-digest V4 #2935 - strange world***From:*"Steichen, Thomas J." <[email protected]>

- Prev by Date:
**Re: st: rolling regression from help file** - Next by Date:
**Re: RE: st: ICE in STATA 10.0** - Previous by thread:
**st: RE: RE: statalist-digest V4 #2935 - strange world** - Next by thread:
**Re: st: RE: RE: RE: statalist-digest V4 #2935 - strange world** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |