[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Nicolas Van de Sijpe <nicolas.vandesijpe@economics.ox.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Binary representation and number precision |

Date |
Sun, 13 Apr 2008 10:45:08 +0100 |

Hi, I’m using Stata for a relatively elaborate data construction effort. To guard against mistakes, at various stages in the data construction I want to check whether certain combinations of variables (constructed from the raw data) yield zero. But when doing so I run into problems because of Stata’s way of storing numbers. Consider the following example: . set obs 1 obs was 0, now 1 . gen var1 = 6.1 . gen var2 = 6 . gen var3 = 0.1 I have not changed default type settings, so these variables are stored as floats. Hence, 6.1 is stored as a number just a little bit smaller than 6.1, and 0.1 is stored as a number a smidgen bigger than 0.1. If I want to check whether var1 – var2 – var3 = 0 (this is similar to the kind of checks I actually want to carry out) I get: . gen test = var1 - var2 - var3 . count if test == 0 0 This makes sense: Stata thinks 6.1 is just a little bit smaller than 6.1 and that 0.1 is just a little bit bigger than 0.1, so the test variable ends up being a very small negative number (-9.686e-08 to be precise). Basically my question is: what is the best way to get around this? I would like to get an exact zero in the above example, that way I can identify observations for which the data has not been constructed correctly (test variable would be non-zero) without having to worry that the non-zero number I’m getting is due to number precision issues. All I can think of is storing all variables as double right from the beginning and making sure that every newly created variable is also double. Am I right to think this would solve the problem because then the precision at which Stata stores the number is the same as the precision it uses to perform calculations? Is there an alternative solution? In this example, I could also have done: . count if var1 - var2 - var3 == float(var1) - float(var2) - float(var3) 1 But I think this would quickly become very tricky in more complex situations. Lastly, I usually copy and paste data from Excel into Stata’s data editor, but if I’m not mistaken it’s not possible this way to immediately identify variables as double (changing default type to double doesn’t seem to make a difference). So, if I want all my variables to be double right from the beginning, is the only option to use commands such as insheet? Many thanks, Nicolas * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Binary representation and number precision***From:*"Friedrich Huebler" <fhuebler@gmail.com>

**Re: st: Binary representation and number precision***From:*Maarten buis <maartenbuis@yahoo.co.uk>

- Prev by Date:
**Re: st: WESMLE** - Next by Date:
**st: Feasible fixed effects GLS estimator** - Previous by thread:
**st: bootstrap command help** - Next by thread:
**Re: st: Binary representation and number precision** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |