Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: RE: RE: statalist-digest V4 #2935 - strange world


From   "Joseph Coveney" <[email protected]>
To   "Statalist" <[email protected]>
Subject   Re: st: RE: RE: RE: RE: statalist-digest V4 #2935 - strange world
Date   Wed, 9 Jan 2008 07:58:49 +0900

Thomas J. Steichen wrote:

Nick Cox writes (in part):

The first step in Stata's position is that missing numeric
values must themselves be assigned a numeric value. This
follows from the fact than when -sort-ed on a numeric
value, observations with missings must go somewhere.

The second step is then over what is to be done with missings given
an inequality > or <.

But, seriously, what are the alternatives?
I will take, at face value, Nick's question about alternatives
and see if I can provide answers.

To do so, I'll start with Nick's argument above that "numeric values
must themselves be assigned a numeric value. This follows from the
fact than when -sort-ed on a numeric value, observations with
missings must go somewhere."

It seems evident that missing values could be ignored in a sort
then _arbitrarily_ placed either first or last, without consideration
of their "assigned" numeric value. Therefore the need to consider
any "assigned" value for missing as a legitimate, sortable "numeric"
value is unnecessary. While I agree that missing need to be assigned
a value for storage purposes, I can't agree that that value should
fit into the ordered, numeric system. Clearly, if I knew it was a
big (or small) value, then it wouldn't truly be missing.

--------------------------------------------------------------------------------

SQL does this in its three-valued logic implementation.  In SQL, missingness
is handled as NULL.  NULL is not a value, however; it is a condition.
Therefore, NULL is never equal to any value.  Nevertheless, NULLs sort.

It should be noted, however, this implementation results in its own gotcha
for programmers.  Because NULL isn't a value, NULL isn't even equal to NULL.
A program that attempts selecting observations on the criterion of variable
equaling NULL will fail.  This has resulted in programs not doing what the
programmer intended, so Stata is not alone in this respect.  (SQL has to
implement a special syntax to test for NULLness in the data.  All of this is
illustrated in the do-file below.)

Although SQL's approach appears to do what Tom and Allan seek, there are
those among practitioners of SQL who complain about its implementation of
NULL.  Overall, the situation seems quite analogous to this recurrent
thread.

It seems that there is no ideal answer to the problem.  Because each
approach to the way missing data are handled has its own foibles, I doubt
that anyone who runs away from Stata because of its quirks will be happy for
very long with the alternative run to.  I've grown used to both the way
Stata and the way SQL handle the matter.  I happen to like Stata's better.
It allows for .a to .z, which are extraordinarily useful--this, alone,
trumps all of the alternatives' approaches.*  Stata's also seems easier to
me to write clear and compact code in situations where I need to test for
missingness concomitantly with doing other logical tests.  I can see,
though, where Allan's call for a message might be helpful.

Joseph Coveney

*It's a pain to deal with numerical data in SQL where the result might be
missing for reasons that need to be kept distinguished because they affect
statistical inference in different ways, e.g., patient refused to undergo
procedure, patient forgot about appointment, patient too ill to undergo
test, result below limit of quantitation for assay method, test not
applicable in men, test result still pending, test tube accidentally dropped
on floor by technician, apparent laboratory instrument error, not known why
missing.

clear *
set more off
copy http://www.fda.gov/orphan/designat/list.xls Dummy.xls
local dsn Excel Files;DBQ=Dummy.xls;
odbc exec("CREATE TABLE Test (column_1 INT NULL, column_2 INT NULL)"), ///
 dsn("`dsn'")
odbc ///
 exec("SELECT * INTO [Excel 8.0;Database=Test.xls].[Test] FROM Test"), ///
 dsn("`dsn'")
erase Dummy.xls
local dsn Excel Files;DBQ=Test.xls;
odbc exec("INSERT INTO Test VALUES (1, 1)"), dsn("`dsn'")
odbc exec("INSERT INTO Test VALUES (2, NULL)"), dsn("`dsn'")
odbc exec("INSERT INTO Test VALUES (3, 3)"), dsn("`dsn'")
* NULL sorts last
odbc exec("SELECT column_1 FROM Test ORDER BY column_2"), dsn("`dsn'")
* NULL is not equal to NULL (no returned results)
odbc exec("SELECT * FROM Test WHERE column_2 = NULL"), dsn("`dsn'")
* NULL is not greater than or less than any value
odbc exec("SELECT * FROM Test WHERE column_2 > column_1"), dsn("`dsn'")
odbc exec("SELECT * FROM Test WHERE column_2 < column_1"), dsn("`dsn'")
* Special syntax for handling of NULL condition
odbc exec("SELECT column_1 FROM Test WHERE column_2 IS NULL"), dsn("`dsn'")
odbc exec("SELECT column_1 FROM Test WHERE column_2 IS NOT NULL"), ///
 dsn("`dsn'")
erase Test.xls
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index