"Nick Cox" <n.j.cox@durham.ac.uk>

<statalist@hsphsun2.harvard.edu>

RE: st: Rounding error (?) from substr to (double) real

Wed, 26 Mar 2003 10:53:56 -0000

Jean Marie Linhart > > Nick Cox <n.j.cox@durham.ac.uk> wrote: > > > 1. To keep every digit in a numeric identifier that is > interpretable > > as an integer, use -long- not -double-. The very large > numbers which > > can be approximately be held in a -double- obscure the fact that > > even 8-digit integers cannot all be held exactly, giving rise to > > anomalies such as those you experienced. > > I think Nick mistyped here. He meant that doubles cannot hold 16 > digit integers. They do just fine with 8 digit integers. > > Why is this? > > If I can explain this coherently and without any typos, IEEE double > precision numbers have 64 total bits (binary digits) broken > down into: > 1 bit for the sign, 11 bits for the binary exponent and 52 bits for > the binary fraction. It is the binary fraction that determines the > precision. The binary fraction is intended to represent a binary > number between 1 and 2, i.e., there is an assumed 1 at the front, we > really have 1.F where F is the fractional part that is stored in the > 52 bits. Any nonzero number can be written this way by choosing the > correct exponent. This gives us a precision of 1/2^53. > Since 1e-15 > > 1/2^53 > 1e-16, this means we expect to get 15 digits. Sometimes we > will get 16, but not always. > > For more information, you may like to see: > > http://www.scri.fsu.edu/~jac/MAD3401/Backgrnd/ieee.html > > Or do web searches on "IEEE floating point" Thanks for the correction and detailed analysis. FWIW, I was picking up on Ann Flanagan's original report > I have a set of data with a string variable 13 characters in length, > containing a unique school district identifier -- the first > eight characters > some of which have a leading zero. The remaining five > characters identify > the schools within the districts. I need the district > identifier to be > "real" for collapsing the data to the district level. > Here's what I do > > gen str8 district=substr(rcds,1,8) > gen double dno =real(district) > format dno %08.0f > > When I list the data and/or run -xtgee- on the dataset, > there are rounding > errors such that: > > rcds==4000704000001 > rcds==4000704200001 > > both return a district number of 40007040 and I lose districts in > estimation. That is, Ann reported that real("40007040") and real("40007042") are both held as 40007040 in a double. However, a check confirms Jean-Marie's analysis: this is not true, so there is a small puzzle remaining here. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

