Title | The accuracy of the float data type | |

Author | William Gould, StataCorp |

**float** is a storage format used by Stata, not a computation format.
When you have a number stored as a **float** and you make a calculation,
such as

. gen newvar = sqrt(oldvar)/sqrt(2)

**oldvar** is retrieved and is promoted to a **double**. The entire
computation is then made in double precision, and that result is rounded to
a **float**.

Floats have 7.22 digits of precision, but there is an argument for saying 7.5 digits because it all depends on how you count partial digits.

The way computers store floating point (not to be confused with
**float**, because **double** is also an example of floating point) is

z = a * 2^{p}-2 < a < 2

Here are some examples of how numbers are stored:

z a p ------------------------ 1 1 0 1.5 1.5 0 2 1 1 3 1.5 1 (i.e., 1.5*2^1 = 3) ------------------------

In **float**, 24 bits are allocated for **a**. Thus the largest
integer that can be exactly stored is 2^0 + 2^1 + ... + 2^23 =
(2^24)−1 = 16,777,215. Well, actually, 2^24 = 16,777,216 is also
precisely stored because it is even, but 16,777,217 cannot be precisely
stored. Using Stata, we can demonstrate these factors using Stata's
**float()** function, which rounds to float precision:

. display float(16777216)16777216. display float(16777217)16777216

Good; Stata works just as theory would suggest.

Now how accurate is **float**? Well, for numbers like 16,777,217, the
absolute error is 1, so the relative error is

1/16,777,217 = 5.960e-08

Generally, when you store a number **z** as **float**, what is stored
is **z'**, and you can be assured that

z * (1 - 5.960e-08) <= z' <= z * (1 + 5.960e-08)

How many digits of accuracy is that? I can tell you exactly in binary: 24 binary digits, but how do you count in binary digits in base 10? (By the way, thinking in binary is not difficult here: 24 binary digits means the smallest number is 2^(−24) = 5.96e−08, and there is the same relative accuracy we received above.)

Returning to decimal, you might start by observing that 16,777,216 has 8 digits, but no 8-digit number can be stored, so we don't want to claim 8.

One way to get a base-10 representation would be to take
log_{10}(16,777,216) = 7.2247199. That is the way most numerical
analysts would convert digit accuracy between bases, so we could claim 7.22
decimal digits of accuracy.

The .22 part of 7.2 is subject to misinterpretation because what we just called .22 would, by some, be called one-half. Consider 16,777,216 and some too-big numbers after that:

true number stored if float ---------------------------------------- 16,777,216 16,777,216 16,777,217 16,777,216 16,777,218 16,777,218 16,777,219 16,777,220 [sic] 16,777,220 16,777,220 16,777,221 16,777,220 16,777,222 16,777,222 16,777,223 16,777,224 [sic] 16,777,224 16,777,224 16,777,225 16,777,224 ----------------------------------------

Basically, odd numbers are being rounded to even numbers. A lot of people would call this the loss of half the digits in the last place, and we could develop a different formula that would label that difference 7.5. Sometimes in computer documentation, you will see the statement that float has 7.5 digits of accuracy. They say that and not 7.22 because the authors worry that you might misinterpret what 7.22 means.

Label the difference how you wish; there are 24 binary digits, and the relative accuracy is +/− 2^(−24) = 5.960e−08.

**Note:** The [sic]s in the above have to do with how numbers ending in
exactly “half” (5 in decimal) are rounded; this is the same
problem as rounding 1.5 and 2.5 to one digit in decimal.