Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Rounding error (?) from substr to (double) real


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Rounding error (?) from substr to (double) real
Date   Wed, 26 Mar 2003 10:53:56 -0000

Jean Marie Linhart
>
> Nick Cox <n.j.cox@durham.ac.uk> wrote:
>
> > 1. To keep every digit in a numeric identifier that is
> interpretable
> > as an integer, use -long- not -double-. The very large
> numbers which
> > can be approximately be held in a -double- obscure the fact that
> > even 8-digit integers cannot all be held exactly, giving rise to
> > anomalies such as those you experienced.
>
> I think Nick mistyped here.  He meant that doubles cannot hold 16
> digit integers.  They do just fine with 8 digit integers.
>
> Why is this?
>
> If I can explain this coherently and without any typos, IEEE double
> precision numbers have 64 total bits (binary digits) broken
> down into:
> 1 bit for the sign, 11 bits for the binary exponent and 52 bits for
> the binary fraction.  It is the binary fraction that determines the
> precision.  The binary fraction is intended to represent a binary
> number between 1 and 2, i.e., there is an assumed 1 at the front, we
> really have 1.F where F is the fractional part that is stored in the
> 52 bits.  Any nonzero number can be written this way by choosing the
> correct exponent.  This gives us a precision of 1/2^53.
> Since 1e-15 >
> 1/2^53 > 1e-16, this means we expect to get 15 digits.  Sometimes we
> will get 16, but not always.
>
> For more information, you may like to see:
>
> http://www.scri.fsu.edu/~jac/MAD3401/Backgrnd/ieee.html
>
> Or do web searches on "IEEE floating point"

Thanks for the correction and detailed analysis. FWIW, I was picking
up on
Ann Flanagan's original report

> I have a set of data with a string variable 13 characters in length,
> containing a unique school district identifier -- the first
> eight characters
> some of which have a leading zero.  The remaining five
> characters identify
> the schools within the districts.  I need the district
> identifier to be
> "real" for collapsing the data to the district level.
> Here's what I do
>
> gen str8 district=substr(rcds,1,8)
> gen double dno =real(district)
> format dno %08.0f
>
> When I list the data and/or run -xtgee- on the dataset,
> there are rounding
> errors such that:
>
> rcds==4000704000001
> rcds==4000704200001
>
> both return a district number of 40007040 and I lose districts in
> estimation.

That is, Ann reported that real("40007040") and real("40007042")
are both held as 40007040 in a double. However, a check confirms
Jean-Marie's analysis: this is not true, so there is a small
puzzle remaining here.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index