Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Spss's aggregate vs stata's collapse.

From   Ulrich Kohler <>
Subject   Re: st: Spss's aggregate vs stata's collapse.
Date   Wed, 13 Apr 2011 11:52:17 +0200

Am Mittwoch, den 13.04.2011, 10:05 +0100 schrieb Brendan Halpin:
> On Wed, Apr 13 2011, Amadou DIALLO wrote:
> > Hi,
> > I am translating spss commands to stata and have trouble with different outputs.
> > Results are different after "aggregate" for ceb (children ever born).
> If the two files are exactly identical at the collapse/aggregate point
> (and that's worth verifying, as the generate/if and compute/if commands
> will not necessarily be identical in the case of missing values on the
> right hand side), I would guess it has to do with SPSS and Stata
> handling weights differently in this situation. You could test this by
> re-running the manipulation without weights. Note the
> "negative/zero/missing weight" warning you get with SPSS. 
> If that is the problem, one possible workaround is to handle the weights
> yourself: multiply ceb by the weight variable, and sum the result in the
> -collapse- statement. 

This reminds me to something. SPSS, might be inconsistent in handling
the weights in itself. In Stata doing something like this

. sysuse auto, clear
. reg price for [aweight=gear_ratio]
. scalar d1 = _b[foreign]

. sum price if !for [aweight=gear_ratio] 
. scalar d2 = r(mean)
. sum price if for [aweight=gear_ratio] 
. scalar d2 = r(mean)-d2

. collapse price [aweight=gear_ratio], by(for)
. scalar d3 = price[2]-price[1]

yields to (almost) identical results for scalars d1, d2, d3:

. scalar list d1 d2 d3
        d1 =   478.0205
        d2 =   478.0205
        d3 =  478.02051

The last time I checked (some years ago) this was not the case in SPSS.
With non-integer weights SPSS yielded to different results for d1 than
for d2 and d3. If I remember correctly, SPSS-aggregate and
SPSS-descriptives seemed to use some kind of rounding for non-integer
weights, although I did not found out what kind of rounding they used. 

Wonder whether this observation is still valid. 


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index