Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Binary representation and number precision


From   Nicolas Van de Sijpe <nicolas.vandesijpe@economics.ox.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   st: Binary representation and number precision
Date   Sun, 13 Apr 2008 10:45:08 +0100

Hi,

Iím using Stata for a relatively elaborate data construction effort. To guard
against mistakes, at various stages in the data construction I want to check
whether certain combinations of variables (constructed from the raw data) yield
zero. But when doing so I run into problems because of Stataís way of storing
numbers. Consider the following example:

. set obs 1
obs was 0, now 1

. gen var1 = 6.1

. gen var2 = 6

. gen var3 = 0.1

I have not changed default type settings, so these variables are stored as
floats. Hence, 6.1 is stored as a number just a little bit smaller than 6.1, and
0.1 is stored as a number a smidgen bigger than 0.1. If I want to check whether
var1 Ė var2 Ė var3 = 0 (this is similar to the kind of checks I actually want to
carry out) I get:

. gen test = var1 - var2 - var3

. count if test == 0
    0

This makes sense: Stata thinks 6.1 is just a little bit smaller than 6.1 and
that 0.1 is just a little bit bigger than 0.1, so the test variable ends up
being a very small negative number (-9.686e-08 to be precise).

Basically my question is: what is the best way to get around this? I would like
to get an exact zero in the above example, that way I can identify observations
for which the data has not been constructed correctly (test variable would be
non-zero) without having to worry that the non-zero number Iím getting is due to
number precision issues.  All I can think of is storing all variables as double
right from the beginning and making sure that every newly created variable is
also double. Am I right to think this would solve the problem because then the
precision at which Stata stores the number is the same as the precision it uses
to perform calculations? Is there an alternative solution?

In this example, I could also have done:

. count if var1 - var2 - var3 == float(var1) - float(var2) - float(var3)
    1

But I think this would quickly become very tricky in more complex situations.

Lastly, I usually copy and paste data from Excel into Stataís data editor, but
if Iím not mistaken itís not possible this way to immediately identify variables
as double (changing default type to double doesnít seem to make a difference).
So, if I want all my variables to be double right from the beginning, is the
only option to use commands such as insheet?

Many thanks,

Nicolas



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index