[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Missing Values in Matrices |

Date |
Wed, 25 Sep 2002 11:00:12 -0500 |

Roger Newson <roger.newson@kcl.ac.uk>, to a reply I just wrote, asked > What precisely is a NaN? And does it have any connection with missing > values, or with the "magic number" 1e300 mentioned in -[R] tabstat-? I > can't find any reference to NaNs in -[R] matrix define-. Excuse me for using jargon. NaN stands for "Not a Number" as defined by the IEE Standard for Binary Floating Point Arithmetic ANSI/IEE 8754-1985. That standard defines how the coprocessor on your computer works. NaN is a way of encoding missing values, but it is not the one that Stata uses. Were we reimplementing Stata from scratch today, we would probably adopt the NaN standard. This all has to do with the way Stata internally works and you would not care one way or the other, but it probably would simplify our life here a little at StataCorp. When Stata was first implement, this IEEE standard was still not widely accepted, so we developed our own. You need to think back to the time before coprocessors. The C compiler we used to compile Stata used the IEEE standard but Microsoft's BASIC, for instance, used IBM's COMP-3 standard which was in wide use because it was used on the then-popular System/370. With the introduction of the Intel floating-point coprocessors, the IEEE standard did catch on, but the early implementations really did not follow it very carefully. Intel's chips followed the standard, but most people did not have coprocessors and instead software was used to emulate the behavior of the chip. Imulate is a poor choice of words here. Aped was more like it. All of that is cleaned up now, but even as recently as a three or four years ago I remember struggling to deal with different interpretations of the "standard", which is clear enough on what a NaN is but not how it is to be used. The problem arose in Stata's behavior across platforms. Remember, we support Windows, Macintosh, and Unix (lots of them), and there is not agreement among them on what should happen, for instance, when you divide by zero. There is an old tradition called the "exception", which means the computer crashes. Exceptions, however, can be intercepted and you can avoid the crash. Then there is the modern idea of a NaN: divide by zero and you get a NaN, not a crash. In the early days, even after adoptation of the IEEE standard, lots of computers continued to yield exceptions on errors rather than NaNs. I use divide by zero just as an example, but I think the problem was actually exp(x) for x too small, which can lead to an "Underflow exception" (old tradition) or NaN (modern tradition). It does not matter; the point is that we some computers yielding exceptions and others yielding a NaNs, this time because the more "modern" computers were switching over to the NaN idea. So now there is lots of code inside Stata protecting it from both traditions. Whether an exception arises or a NaN, as quickly as Stata can, it maps the result to its own concept of a missing value and handles it from there. How consistent are things nowadays? I really do not know. The protection code inside Stata prevents me from knowing because it so effectively covers up the problem. Now you know more than you ever wanted. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: input commands** - Next by Date:
**st: HELP! xtgls and heteroskedasticity test** - Previous by thread:
**RE: st: Missing Values in Matrices** - Next by thread:
**st: unhelpful error message for simul** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |