Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: RE: st: A bug in egen and gen?

From   "Liao, Junlin" <>
To   "" <>
Subject   RE: RE: st: A bug in egen and gen?
Date   Sat, 19 Feb 2011 17:06:40 +0000


I disagree on this point. For one, the variables in float may be process variables such as calculated variables. It will be feed into statistical calculation and subject to permutations. For two, case in mind, Stata store 4.1 not as 4.1, but as many digits of 9's following 4.0. Excel use double precision for numeric values. It gives 4.1 with many 0's followed. However, that's just an impression. In reality, it is still stored as something close to 4.1 (Excel achieved this by forcing the numbers not to show last three digits), closer to 4.1 than a single precision number. Certainly a number is not stored as the number you ask and many zeros. In that case, the extra zeros do not add anything (even in that case, it's waste instead of noise, noise is a concept of inaccuracy and belongs to float type). As you have stated, because of unknown numbers of iterations in calculations, double is much better than float. If we feed float to double calculation, there will be more noise. I !
 carefully weigh the advantages and disadvantages and made my recommendations. My recommendations are not random thoughts, almost all other packages have been doing it.

I do have datasets with millions of records and deal with them regularly in my financial analysis. But I'm not bothered with storage at all even though I know not all my data is most efficiently stored. I'm constantly reminded to read Bill Gould. I did read the article but still failed to see relevance. Analysis with millions of records sounds fun. I just do not know how many of Stata users have that luxery/burden. What's a typical Stata user and what's his/her typical sample size? The companies developing memory and storage are in the gigabyte world and we are arguing about sizes in megabytes and kilobytes. That's why I'm not impressed by the argument that we should stay with float. Technically you can argue that float is accurate enough. It's more a technical obsession than real benefit. I see added accuracy with no marginal cost.

As long as Stata give me the option to go double all the time, I'm happy with Stata.

From: [] on behalf of Maarten buis []
Sent: Saturday, February 19, 2011 6:08 AM
Subject: RE: RE: st: A bug in egen and gen?

--- On Fri, 18/2/11, Liao, Junlin wrote:
> If float can indeed do the job with sufficient precision
> like everybody who tried to argue for float indicated, then
> why Stata performs its calculations with double precision?

The reason is that many statistical techniques involve a great
many computations, and a small error made many times can easily
turn out to become a big error. Ok, you might say, then why
don't I prevent a further error by storing my data as a
double? The answer is that that way you are not preventing
an error: a float will store a variable with much more
precision than with which it is measured, so the extra digit
you are storing contain exactly 0 information.

-- Maarten

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen

*   For searches and help try:

Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index