Nick Cox writes (in part):
>The first step in Stata's position is that missing numeric
>values must themselves be assigned a numeric value. This
>follows from the fact than when -sort-ed on a numeric
>value, observations with missings must go somewhere.
>The second step is then over what is to be done with missings given
>an inequality > or <.
>But, seriously, what are the alternatives?
I will take, at face value, Nick's question about alternatives
and see if I can provide answers.
To do so, I'll start with Nick's argument above that "numeric values
must themselves be assigned a numeric value. This follows from the
fact than when -sort-ed on a numeric value, observations with
missings must go somewhere."
It seems evident that missing values could be ignored in a sort
then _arbitrarily_ placed either first or last, without consideration
of their "assigned" numeric value. Therefore the need to consider
any "assigned" value for missing as a legitimate, sortable "numeric"
value is unnecessary. While I agree that missing need to be assigned
a value for storage purposes, I can't agree that that value should
fit into the ordered, numeric system. Clearly, if I knew it was a
big (or small) value, then it wouldn't truly be missing.
>Allan and Tom
>seem to want "if x > 42" to ignore missings on -x-. If that
>were so, then it would solve one problem only to replace
>it with at least four others, on quite different levels:
>1. Stata is now inconsistent. Missings are assigned precise
>numeric values for some purposes (e.g. -sort-ing) but not others.
Given my argument above, a precise numeric value is not need
for sorting, so there is no inconsistency.
>2. What, under that proposal, would be the truth value of
>an expression, say . > 42
Let me rewrite this as "local y = ." then ask what truth value I
would assign to `y' > 42. Answer: undefined (i.e., missing).
Same as local x = `y'*42. Again since I don't know the value
of y, why should I know the truth value of a statement about y
or any other operation on it?
Interpreting the truth statement this way would be highly
consistent with the way Stata handles missing elsewhere.
>3. Designing a language according to what users are supposed to mean,
>rather than what they say, is, in my experience, a very long, very
>slippery slope to perdition.
Designing a language that ignores users' normal logical
interpretation is also, quote: "a very long, very slippery slope
to perdition." I doubt that most people would expect a logical
operation on an unknown value to have a known truth value (other
than an operation which asks whether the value is known). It could
well be that StataCorp's choice to subject users to this particular
trap is a reason the package is not chosen for use by more users.
I know I deal with missing values in most of my datasets;
unfortunately I cannot say with certainty that I have not blown
some analyses because of this implementation choice. I hope not
but I'm also glad my management has never asked me to justify
using a package that has this known trap.
>4. Declaring this behaviour now to be a bug, or at least a misfeature,
>would be a major change in Stata. Goodness knows how many scripts, programs
>and understandings would be broken by such a change, even under version
>control.
I suspect StataCorp's programmers could minimize the impact.
However, even if they could not implement the change without
breaking some of my code, I'd rather pay the price of rewriting
code than continue risking a bad analysis. Sometimes the price
of a greater good is some immediate pain. While the instances
of Stata changes breaking (properly implemented) user code
is rare, it has occurred and we survived.
---------
I fully understand StataCorp's right to implement Stata in
the way they believe best, but I also see value in changing the
current implementation of comparisons to missing. I believe there
would be value to me as a user (avoidance of bad analyses) and to
StataCorp (removing an implementation choice that some (many?)
users believe is wrong and that, perhaps, could be a reason for
not choosing Stata as their statistical package).
Tom
-----------------------------------------
CONFIDENTIALITY NOTE: This e-mail message, including any
attachment(s), contains information that may be confidential,
protected by the attorney-client or other legal privileges, and/or
proprietary non-public information. If you are not an intended
recipient of this message or an authorized assistant to an intended
recipient, please notify the sender by replying to this message and
then delete it from your system. Use, dissemination, distribution,
or reproduction of this message and/or any of its attachments (if
any) by unintended recipients is not authorized and may be
unlawful.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/