"Nick Cox" <n.j.cox@durham.ac.uk>

<statalist@hsphsun2.harvard.edu>

RE: st: RE: sum: collapse vs egen

Sat, 5 Sep 2009 17:52:02 +0100

I can't add to what I said earlier, or wrote much earlier as cited, except to emphasise that this function is here on a knife-edge: . di round(8.755, 0.01) 8.76 . di round(3.86 + 4.895, 0.01) 8.75 If you want exact decimal calculations, you need to do all your workings in integers, and convert only when obliged to. Nick n.j.cox@durham.ac.uk ciccarec@uniroma2.it Hello Nick, thanks for quick answering. Quoting "Nick Cox" <n.j.cox@durham.ac.uk>: 3. Not your question, but -egen, sum()- is a poor way to do a sum. but in the numerical example I provided egen,sum() turns out to provide the exact answer (8.76), which I can't get using collapse (sum). My problem is that I never use "egen, sum" when working with real data while I often use "collapse (sum)", that seems to be not very appropriate. I think the problem is in the way the "collapse" command and the "round" function are related: I verified that if the line "gen double r2=round(a*100)/100" (see after the "collapse" command) is separated in 2 parts, like: collapse (sum) a gen double temp=a*100 gen double r2=round(temp)/100 The resulting r2 is correct (but I don't know why). Quoting "Nick Cox" <n.j.cox@durham.ac.uk>: > I have three comments here. > > 1. -egen- by default will generate -float- variables (unless you have > -set type double-). So, you shouldn't be surprised to lose a little > precision there. Functions like -round()- that may make knife-edge > decisions are likely to show this up. The first FAQ cited below is a > similar case. > > 2. This is all part of a larger issue: Stata works in binary. Stata does > not do decimal arithmetic! > > See for example > > FAQ . . . . . . . . . . . . . . . . . . . Results of the mod(x,y) > function > . . . . . . . . . . . . . . . . . . . . . N. J. Cox and T. J. > Steichen > 2/03 Why does the mod(x,y) function sometimes give > puzzling results? > Why is mod(0.3,0.1) not equal to 0? > http://www.stata.com/support/faqs/data/mod.html > > FAQ . . . . . . . . . . . . . . . . . The accuracy of the float > data type > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. > Gould > 5/01 How many significant digits are there in a float? > http://www.stata.com/support/faqs/data/prec.html > > FAQ . . . . . . . . . Comparing floating-point values (the float > function) > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. > Wernow > 3/01 Why can't I compare two values that I know are equal? > http://www.stata.com/support/faqs/data/float.html > > FAQ . . . . . . . . . Why am I losing precision with large whole > numbers? > . . . . . . . . . . . . . . . . . . UCLA Academic Technology > Services > 7/08 http://www.ats.ucla.edu/stat/stata/faq/longid.htm > > SJ-8-2 pr0038 Mata Matters: Overflow, underflow & IEEE floating-point > format > . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. M. > Linhart > Q2/08 SJ 8(2):255--268 (no > commands) > focuses on underflow and overflow and details of how > floating-point numbers are stored in the IEEE 754 > floating-point standard > > SJ-6-4 pr0025 . . . . . . . . . . . . . . . . . . . Mata matters: > Precision > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. > Gould > Q4/06 SJ 6(4):550--560 (no > commands) > looks at programming implications of the floating-point, > base-2 encoding that modern computers use > > SJ-6-2 dm0022 . Tip 33: Sweet sixteen: Hexadec. formats & precision > problems > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. > J. Cox > Q2/06 SJ 6(2):282--283 (no > commands) > tip for using hexadecimal formats to understand precision > problems in Stata > > 3. Not your question, but -egen, sum()- is a poor way to do a sum. A > better way is -summarize, meanonly-. However, I guess that your real > problem is understanding what -egen- does with some real data, but > nevertheless note using -summarize- directly and picking up r(sum) is > always better for a single sum. > > Nick > n.j.cox@durham.ac.uk > > Carlo > > Here is my code: > > clear > version 9.2 > set obs 1 > gen double a=3.86 > save data1,replace > > clear > set obs 1 > gen double a=4.895 > save data2,replace > > use data1,clear > append using data2 > egen sum=sum(a) > gen double r1=round(sum*100)/100 > list > collapse (sum) a > gen double r2=round(a*100)/100 > list > > Why are r1 and r2 not equal ? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

