RE: st: collapse (sum) versus egen (sum)

From   [email protected]
To   [email protected]
Subject   RE: st: collapse (sum) versus egen (sum)
Date   Thu, 19 Aug 2004 16:22:08 +0200

Thanks Dan, 

I will look into it. I was also VERY surprised to see that there is a
difference between the two commands. I considered them -- and I still
believe they are intended -- to be equivalent. 

--- Ben


Ben Kriechel

Research Centre for Education
and the Labour Market
<[email protected]> 

From: Dan Blanchette [mailto:[email protected]] 
Sent: donderdag 19 augustus 2004 16:13
To: [email protected]
Subject: Re: st: collapse (sum) versus egen (sum)

Perhaps more information about the discrpency you noticed would be helpful?

-egen- and -collapse- both use -generate-'s sum function.

The only difference I see is that -collapse- generates the new variable as
datatype double where -egen- let's you choose, though the default is float.
If you choose a datatype that cannot handle the resulting sum, then you
would end up with missing values and thus a different result than you would
with -collapse-.  If your sums are larger than floats maximum
then you would get varying results.  Try your program again choosing
double with -egen- to see if this fixes the problem.

To view the ado files for these commands you can download -adoedit- from
the SSC or:

. findfile collapse.ado

. view "`r(fn)'"

. findfile egen.ado

. view "`r(fn)'"

or more specifically:

. findfile _gsum.ado

. view "`r(fn)'"

Dan Blanchette
Applications Analyst Programmer
Carolina Population Center UNC-CH

> While aggregating a dataset using collapse some strange results were
> obtained:
> collapse (sum) aantal, by(opl114)
> Did not give the same results as the same dataset gave in other programs.
> Checking with
> egen oplantal=sum(aantal),by(opl114)
> though gave exactly the same (correct) number that other programs gave me.
> Can somebody explain to me how the summation (could) differ between
> and egen?
> Thanks.
