Home  /  Resources & support  /  FAQs  /  The accuracy of the float data type

How many significant digits are there in a float?

Title   The accuracy of the float data type
Author William Gould, StataCorp

float is a storage format used by Stata, not a computation format. When you have a number stored as a float and you make a calculation, such as

        . gen newvar = sqrt(oldvar)/sqrt(2)

oldvar is retrieved and is promoted to a double. The entire computation is then made in double precision, and that result is rounded to a float.

Floats have 7.22 digits of precision, but there is an argument for saying 7.5 digits because it all depends on how you count partial digits.

The way computers store floating point (not to be confused with float, because double is also an example of floating point) is

        z = a * 2p	-2 < a < 2

Here are some examples of how numbers are stored:

         z       a          p
         1       1          0
         1.5     1.5        0
         2       1          1
         3       1.5        1     (i.e., 1.5*2^1 = 3)

In float, 24 bits are allocated for a. Thus the largest integer that can be exactly stored is 2^0 + 2^1 + ... + 2^23 = (2^24)−1 = 16,777,215. Well, actually, 2^24 = 16,777,216 is also precisely stored because it is even, but 16,777,217 cannot be precisely stored. Using Stata, we can demonstrate these factors using Stata's float() function, which rounds to float precision:

        . display float(16777216)

        . display float(16777217)

Good; Stata works just as theory would suggest.

Now how accurate is float? Well, for numbers like 16,777,217, the absolute error is 1, so the relative error is

        1/16,777,217 = 5.960e-08

Generally, when you store a number z as float, what is stored is z', and you can be assured that

        z * (1 - 5.960e-08)  <=  z'  <=  z * (1 + 5.960e-08)

How many digits of accuracy is that? I can tell you exactly in binary: 24 binary digits, but how do you count in binary digits in base 10? (By the way, thinking in binary is not difficult here: 24 binary digits means the smallest number is 2^(−24) = 5.96e−08, and there is the same relative accuracy we received above.)

Returning to decimal, you might start by observing that 16,777,216 has 8 digits, but no 8-digit number can be stored, so we don't want to claim 8.

One way to get a base-10 representation would be to take log10(16,777,216) = 7.2247199. That is the way most numerical analysts would convert digit accuracy between bases, so we could claim 7.22 decimal digits of accuracy.

The .22 part of 7.2 is subject to misinterpretation because what we just called .22 would, by some, be called one-half. Consider 16,777,216 and some too-big numbers after that:

        true number           stored if float
        16,777,216            16,777,216
        16,777,217            16,777,216
        16,777,218            16,777,218
        16,777,219            16,777,220    [sic]
        16,777,220            16,777,220
        16,777,221            16,777,220
        16,777,222            16,777,222
        16,777,223            16,777,224    [sic]
        16,777,224            16,777,224
        16,777,225            16,777,224

Basically, odd numbers are being rounded to even numbers. A lot of people would call this the loss of half the digits in the last place, and we could develop a different formula that would label that difference 7.5. Sometimes in computer documentation, you will see the statement that float has 7.5 digits of accuracy. They say that and not 7.22 because the authors worry that you might misinterpret what 7.22 means.

Label the difference how you wish; there are 24 binary digits, and the relative accuracy is +/− 2^(−24) = 5.960e−08.

Note: The [sic]s in the above have to do with how numbers ending in exactly “half” (5 in decimal) are rounded; this is the same problem as rounding 1.5 and 2.5 to one digit in decimal.