[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: Rounding error (?) from substr to (double) real
Jean Marie Linhart
> Nick Cox <email@example.com> wrote:
> > 1. To keep every digit in a numeric identifier that is
> > as an integer, use -long- not -double-. The very large
> numbers which
> > can be approximately be held in a -double- obscure the fact that
> > even 8-digit integers cannot all be held exactly, giving rise to
> > anomalies such as those you experienced.
> I think Nick mistyped here. He meant that doubles cannot hold 16
> digit integers. They do just fine with 8 digit integers.
> Why is this?
> If I can explain this coherently and without any typos, IEEE double
> precision numbers have 64 total bits (binary digits) broken
> down into:
> 1 bit for the sign, 11 bits for the binary exponent and 52 bits for
> the binary fraction. It is the binary fraction that determines the
> precision. The binary fraction is intended to represent a binary
> number between 1 and 2, i.e., there is an assumed 1 at the front, we
> really have 1.F where F is the fractional part that is stored in the
> 52 bits. Any nonzero number can be written this way by choosing the
> correct exponent. This gives us a precision of 1/2^53.
> Since 1e-15 >
> 1/2^53 > 1e-16, this means we expect to get 15 digits. Sometimes we
> will get 16, but not always.
> For more information, you may like to see:
> Or do web searches on "IEEE floating point"
Thanks for the correction and detailed analysis. FWIW, I was picking
Ann Flanagan's original report
> I have a set of data with a string variable 13 characters in length,
> containing a unique school district identifier -- the first
> eight characters
> some of which have a leading zero. The remaining five
> characters identify
> the schools within the districts. I need the district
> identifier to be
> "real" for collapsing the data to the district level.
> Here's what I do
> gen str8 district=substr(rcds,1,8)
> gen double dno =real(district)
> format dno %08.0f
> When I list the data and/or run -xtgee- on the dataset,
> there are rounding
> errors such that:
> both return a district number of 40007040 and I lose districts in
That is, Ann reported that real("40007040") and real("40007042")
are both held as 40007040 in a double. However, a check confirms
Jean-Marie's analysis: this is not true, so there is a small
puzzle remaining here.
* For searches and help try: