Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: sum: collapse vs egen


From   ciccarec@uniroma2.it
To   statalist@hsphsun2.harvard.edu
Subject   RE: st: RE: sum: collapse vs egen
Date   Sat, 05 Sep 2009 19:01:40 +0200

Now I see what you meant before.
This solved my problem.
Thank you,
Carlo


Quoting "Nick Cox" <n.j.cox@durham.ac.uk>:

I can't add to what I said earlier, or wrote much earlier as cited,
except to emphasise that this function is here on a knife-edge:

. di round(8.755, 0.01)
8.76

. di round(3.86 + 4.895, 0.01)
8.75

If you want exact decimal calculations, you need to do all your workings
in integers, and convert only when obliged to.

Nick
n.j.cox@durham.ac.uk

ciccarec@uniroma2.it

Hello Nick,
thanks for quick answering.

Quoting "Nick Cox" <n.j.cox@durham.ac.uk>:
3. Not your question, but -egen, sum()- is a poor way to do a sum.

but in the numerical example I provided egen,sum() turns out to
provide the exact answer (8.76), which I can't get using collapse (sum).
My problem is that I never use "egen, sum" when working with real data
while I often use "collapse (sum)", that seems to be not very
appropriate.

I think the problem is in the way the "collapse" command and the "round"
function are related: I verified that if the line "gen double
r2=round(a*100)/100" (see after the "collapse" command) is separated
in 2 parts, like:
collapse (sum) a
gen double temp=a*100
gen double r2=round(temp)/100

The resulting r2  is correct (but I don't know why).


Quoting "Nick Cox" <n.j.cox@durham.ac.uk>:

I have three comments here.

1. -egen- by default will generate -float- variables (unless you have
-set type double-). So, you shouldn't be surprised to lose a little
precision there. Functions like -round()- that may make knife-edge
decisions are likely to show this up. The first FAQ cited below is a
similar case.

2. This is all part of a larger issue: Stata works in binary. Stata
does
not do decimal arithmetic!

See for example

FAQ     . . . . . . . . . . . . . . . . . . . Results of the mod(x,y)
function
        . . . . . . . . . . . . . . . . . . . . . N. J. Cox and T. J.
Steichen
        2/03    Why does the mod(x,y) function sometimes give
                puzzling results?
                Why is mod(0.3,0.1) not equal to 0?
                http://www.stata.com/support/faqs/data/mod.html

FAQ     . . . . . . . . . . . . . . . . .  The accuracy of the float
data type
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
W.
Gould
        5/01    How many significant digits are there in a float?
                http://www.stata.com/support/faqs/data/prec.html

FAQ     . . . . . . . . . Comparing floating-point values (the float
function)
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
J.
Wernow
        3/01    Why can't I compare two values that I know are equal?
                http://www.stata.com/support/faqs/data/float.html

FAQ     . . . . . . . . .  Why am I losing precision with large whole
numbers?
        . . . . . . . . . . . . . . . . . .  UCLA Academic Technology
Services
        7/08    http://www.ats.ucla.edu/stat/stata/faq/longid.htm

SJ-8-2  pr0038  Mata Matters: Overflow, underflow & IEEE
floating-point
format
        . . . . . . . . . . . . . . . . . . . . . . . . . . . .  J. M.
Linhart
        Q2/08   SJ 8(2):255--268                                 (no
commands)
        focuses on underflow and overflow and details of how
        floating-point numbers are stored in the IEEE 754
        floating-point standard

SJ-6-4  pr0025  . . . . . . . . . . . . . . . . . . .  Mata matters:
Precision
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
W.
Gould
        Q4/06   SJ 6(4):550--560                                 (no
commands)
        looks at programming implications of the floating-point,
        base-2 encoding that modern computers use

SJ-6-2  dm0022  . Tip 33: Sweet sixteen: Hexadec. formats & precision
problems
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
N.
J. Cox
        Q2/06   SJ 6(2):282--283                                 (no
commands)
        tip for using hexadecimal formats to understand precision
        problems in Stata

3. Not your question, but -egen, sum()- is a poor way to do a sum. A
better way is -summarize, meanonly-. However, I guess that your real
problem is understanding what -egen- does with some real data, but
nevertheless note using -summarize- directly and picking up r(sum) is
always better for a single sum.

Nick
n.j.cox@durham.ac.uk

Carlo

    Here is my code:

clear
version 9.2
set obs 1
gen double a=3.86
save data1,replace

clear
set obs 1
gen double a=4.895
save data2,replace

use data1,clear
append using data2
egen sum=sum(a)
gen double r1=round(sum*100)/100
list
collapse (sum) a
gen double r2=round(a*100)/100
list

Why are r1 and r2 not equal ?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index