How many significant digits are there in a float?
|
Title
|
|
The accuracy of the float data type
|
|
Author
|
William Gould, StataCorp
|
|
Date
|
May 2001
|
float is a storage format used by Stata, not a computation format.
When you have a number stored as a float and you make a calculation,
such as
. gen newvar = sqrt(oldvar)/sqrt(2)
oldvar is retrieved and is promoted to a double. The entire
computation is then made in double precision, and that result is rounded to
a float.
Floats have 7.22 digits of precision, but there is an argument for saying
7.5 digits because it all depends on how you count partial digits.
The way computers store floating point (not to be confused with
float, because double is also an example of floating point) is
z = a * 2p -2 < a < 2
Here are some examples of how numbers are stored:
z a p
------------------------
1 1 0
1.5 1.5 0
2 1 1
3 1.5 1 (i.e., 1.5*2^1 = 3)
------------------------
In float, 24 bits are allocated for a. Thus the largest
integer that can be exactly stored is 2^0 + 2^1 + ... + 2^23 =
(2^24)−1 = 16,777,215. Well, actually, 2^24 = 16,777,216 is also
precisely stored because it is even, but 16,777,217 cannot be precisely
stored. Using Stata, we can demonstrate these factors using Stata's
float() function, which rounds to float precision:
. display float(16777216)
16777216
. display float(16777217)
16777216
Good; Stata works just as theory would suggest.
Now how accurate is float? Well, for numbers like 16,777,217, the
absolute error is 1, so the relative error is
1/16,777,217 = 5.960e-08
Generally, when you store a number z as float, what is stored
is z', and you can be assured that
z * (1 - 5.960e-08) <= z' <= z * (1 + 5.960e-08)
How many digits of accuracy is that? I can tell you exactly in binary: 24
binary digits, but how do you count in binary digits in base 10? (By the
way, thinking in binary is not difficult here: 24 binary digits means the
smallest number is 2^(−24) = 5.96e−08, and there is the same
relative accuracy we received above.)
Returning to decimal, you might start by observing that 16,777,216 has 8
digits, but no 8-digit number can be stored, so we don't want to claim 8.
One way to get a base-10 representation would be to take
log10(16,777,216) = 7.2247199. That is the way most numerical
analysts would convert digit accuracy between bases, so we could claim 7.22
decimal digits of accuracy.
The .22 part of 7.2 is subject to misinterpretation because what we just
called .22 would, by some, be called one-half. Consider 16,777,216 and some
too-big numbers after that:
true number stored if float
----------------------------------------
16,777,216 16,777,216
16,777,217 16,777,216
16,777,218 16,777,218
16,777,219 16,777,220 [sic]
16,777,220 16,777,220
16,777,221 16,777,220
16,777,222 16,777,222
16,777,223 16,777,224 [sic]
16,777,224 16,777,224
16,777,225 16,777,224
----------------------------------------
Basically, odd numbers are being rounded to even numbers. A lot of people
would call this the loss of half the digits in the last place, and we could
develop a different formula that would label that difference 7.5. Sometimes
in computer documentation, you will see the statement that float has 7.5
digits of accuracy. They say that and not 7.22 because the authors worry
that you might misinterpret what 7.22 means.
Label the difference how you wish; there are 24 binary digits, and the
relative accuracy is +/− 2^(−24) = 5.960e−08.
Note: The [sic]s in the above have to do with how numbers ending in
exactly “half” (5 in decimal) are rounded; this is the same
problem as rounding 1.5 and 2.5 to one digit in decimal.
|
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
|