[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Rounding error (?) from substr to (double) real |

Date |
Wed, 26 Mar 2003 10:53:56 -0000 |

Jean Marie Linhart > > Nick Cox <n.j.cox@durham.ac.uk> wrote: > > > 1. To keep every digit in a numeric identifier that is > interpretable > > as an integer, use -long- not -double-. The very large > numbers which > > can be approximately be held in a -double- obscure the fact that > > even 8-digit integers cannot all be held exactly, giving rise to > > anomalies such as those you experienced. > > I think Nick mistyped here. He meant that doubles cannot hold 16 > digit integers. They do just fine with 8 digit integers. > > Why is this? > > If I can explain this coherently and without any typos, IEEE double > precision numbers have 64 total bits (binary digits) broken > down into: > 1 bit for the sign, 11 bits for the binary exponent and 52 bits for > the binary fraction. It is the binary fraction that determines the > precision. The binary fraction is intended to represent a binary > number between 1 and 2, i.e., there is an assumed 1 at the front, we > really have 1.F where F is the fractional part that is stored in the > 52 bits. Any nonzero number can be written this way by choosing the > correct exponent. This gives us a precision of 1/2^53. > Since 1e-15 > > 1/2^53 > 1e-16, this means we expect to get 15 digits. Sometimes we > will get 16, but not always. > > For more information, you may like to see: > > http://www.scri.fsu.edu/~jac/MAD3401/Backgrnd/ieee.html > > Or do web searches on "IEEE floating point" Thanks for the correction and detailed analysis. FWIW, I was picking up on Ann Flanagan's original report > I have a set of data with a string variable 13 characters in length, > containing a unique school district identifier -- the first > eight characters > some of which have a leading zero. The remaining five > characters identify > the schools within the districts. I need the district > identifier to be > "real" for collapsing the data to the district level. > Here's what I do > > gen str8 district=substr(rcds,1,8) > gen double dno =real(district) > format dno %08.0f > > When I list the data and/or run -xtgee- on the dataset, > there are rounding > errors such that: > > rcds==4000704000001 > rcds==4000704200001 > > both return a district number of 40007040 and I lose districts in > estimation. That is, Ann reported that real("40007040") and real("40007042") are both held as 40007040 in a double. However, a check confirms Jean-Marie's analysis: this is not true, so there is a small puzzle remaining here. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Rounding error (?) from substr to (double) real***From:*jlinhart@stata.com (Jean Marie Linhart, StataCorp.)

- Prev by Date:
**st: Re: statalist-digest V4 #1198** - Next by Date:
**Re: st: RE: dialogs and the USER menu** - Previous by thread:
**Re: st: Rounding error (?) from substr to (double) real** - Next by thread:
**st: negative binomial model - transformation** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |