Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Liao, Junlin" <junlin-liao@uiowa.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: RE: st: A bug in egen and gen? |
Date | Sat, 19 Feb 2011 17:06:40 +0000 |
Maarten, I disagree on this point. For one, the variables in float may be process variables such as calculated variables. It will be feed into statistical calculation and subject to permutations. For two, case in mind, Stata store 4.1 not as 4.1, but as many digits of 9's following 4.0. Excel use double precision for numeric values. It gives 4.1 with many 0's followed. However, that's just an impression. In reality, it is still stored as something close to 4.1 (Excel achieved this by forcing the numbers not to show last three digits), closer to 4.1 than a single precision number. Certainly a number is not stored as the number you ask and many zeros. In that case, the extra zeros do not add anything (even in that case, it's waste instead of noise, noise is a concept of inaccuracy and belongs to float type). As you have stated, because of unknown numbers of iterations in calculations, double is much better than float. If we feed float to double calculation, there will be more noise. I ! carefully weigh the advantages and disadvantages and made my recommendations. My recommendations are not random thoughts, almost all other packages have been doing it. I do have datasets with millions of records and deal with them regularly in my financial analysis. But I'm not bothered with storage at all even though I know not all my data is most efficiently stored. I'm constantly reminded to read Bill Gould. I did read the article but still failed to see relevance. Analysis with millions of records sounds fun. I just do not know how many of Stata users have that luxery/burden. What's a typical Stata user and what's his/her typical sample size? The companies developing memory and storage are in the gigabyte world and we are arguing about sizes in megabytes and kilobytes. That's why I'm not impressed by the argument that we should stay with float. Technically you can argue that float is accurate enough. It's more a technical obsession than real benefit. I see added accuracy with no marginal cost. As long as Stata give me the option to go double all the time, I'm happy with Stata. Junlin ________________________________________ From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Maarten buis [maartenbuis@yahoo.co.uk] Sent: Saturday, February 19, 2011 6:08 AM To: statalist@hsphsun2.harvard.edu Subject: RE: RE: st: A bug in egen and gen? --- On Fri, 18/2/11, Liao, Junlin wrote: > If float can indeed do the job with sufficient precision > like everybody who tried to argue for float indicated, then > why Stata performs its calculations with double precision? The reason is that many statistical techniques involve a great many computations, and a small error made many times can easily turn out to become a big error. Ok, you might say, then why don't I prevent a further error by storing my data as a double? The answer is that that way you are not preventing an error: a float will store a variable with much more precision than with which it is measured, so the extra digit you are storing contain exactly 0 information. -- Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ ________________________________ Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error, then delete it. Thank you. ________________________________ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/