Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: A bug in egen and gen?


From   "Liao, Junlin" <junlin-liao@uiowa.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: A bug in egen and gen?
Date   Thu, 17 Feb 2011 22:48:14 +0000

I took from Bill Gould's article on precision for two points:

1. Stata makes all calculations in double precision, and
2. float provides more than enough precision for most data applications

What really matters comes down to storage. The experiments did not include decimal numbers, therefore, the final storage space would not differ. The point Nick tried to make is that not all numbers are integers, then compress would not make any difference since compress does not convert decimal double numbers to float type (even though it should when the recast does not change the variable values). I looked at my data files. The largest of them all is a dataset with 135MB. If I convert all float numbers to double, I saw an increase of size 14% (to 155MB). That is indeed a significant waste of storage. However, I would still argue for precision. The storage capacity increase as fast as memory does. In fact I keep a few process variables for convenience. If I drop those variables, my file size can be reduced to 129MB with all decimal variable in double type and 119MB with all decimal in float type (difference reduces to less than 10% now). I can reduce further the size of my f!
 ile by getting rid of calculated fields. You can blame me as careless in keeping my files. But in reality I have hundreds of giga bytes wasted every day (sitting there idle) any way. My point is that the storage factor is not that important realistically.

In the old days storage and memory mattered a lot. The programs were much smaller and probably more efficient. The fast increase in computing power is making the distinction of float and double type numbers increasingly irrelevant. For Bill's view, float may indeed provide more than enough precision for most data applications, however, if precision can be gained at negligible cost--I seriously doubt anyone today running Stata have a capacity constraint issue with their computers, why not? I do a lot of data analysis; but still, before my next upgrade of computer, the possibility of running out of disk space because I set my precision to double is zero. That's why I choose to change the default to double and recommend others to do so as well.

Tx,

Junlin

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Maarten buis
Sent: Thursday, February 17, 2011 3:09 PM
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: A bug in egen and gen?

--- On Thu, 17/2/11, Liao, Junlin wrote:
> I just fail to see your point " You can get some of that back by
> -compress-, but not all ". My experiment clearly proves that what
> matters is the "final" storage data type. I understand that by using
> double in place of float or long will increase requirement of memory.
> My point is that computing power is increasing exponentially. For
> example, any computers I use have at least 4GB of memory.
> The machine I load with Stata has 8GB. Memory is least of my concerns,
> but accuracy is always important.

If you store real data as double you are trying to regain accuracy that does not exist in your data. All you have done is doubled the size of your file to store random noise. -compress- will only help avoid this if your variables are all integers.

There are situations where storing or generating variables as doubles make sense, but they are the exception not the rule.

-- Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
________________________________

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index