Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Spss's aggregate vs stata's collapse.


From   Amadou DIALLO <stata.diallo@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Spss's aggregate vs stata's collapse.
Date   Wed, 13 Apr 2011 11:57:39 +0100

Brendan, Uli,
Thanks for answers. Yes, it has to do with weights. Removing it yields
same results. Apparently SPSS rounds non-integer weight to the nearest
integer (the total weighted frequency, not individual weights (sic!):
www.spsstools.net/Tutorials/WEIGHTING.pdf
I've tried Brendan's solution but this is not working. So far, I can't
duplicate results and am stuck. Will continue checking.
Best regards.
Bachir.

2011/4/13, Ulrich Kohler <kohler@wzb.eu>:
> Am Mittwoch, den 13.04.2011, 10:05 +0100 schrieb Brendan Halpin:
>> On Wed, Apr 13 2011, Amadou DIALLO wrote:
>>
>> > Hi,
>> > I am translating spss commands to stata and have trouble with different
>> > outputs.
>> > Results are different after "aggregate" for ceb (children ever born).
>>
>> If the two files are exactly identical at the collapse/aggregate point
>> (and that's worth verifying, as the generate/if and compute/if commands
>> will not necessarily be identical in the case of missing values on the
>> right hand side), I would guess it has to do with SPSS and Stata
>> handling weights differently in this situation. You could test this by
>> re-running the manipulation without weights. Note the
>> "negative/zero/missing weight" warning you get with SPSS.
>>
>> If that is the problem, one possible workaround is to handle the weights
>> yourself: multiply ceb by the weight variable, and sum the result in the
>> -collapse- statement.
>
> This reminds me to something. SPSS, might be inconsistent in handling
> the weights in itself. In Stata doing something like this
>
> . sysuse auto, clear
> . reg price for [aweight=gear_ratio]
> . scalar d1 = _b[foreign]
>
> . sum price if !for [aweight=gear_ratio]
> . scalar d2 = r(mean)
> . sum price if for [aweight=gear_ratio]
> . scalar d2 = r(mean)-d2
>
> . collapse price [aweight=gear_ratio], by(for)
> . scalar d3 = price[2]-price[1]
>
> yields to (almost) identical results for scalars d1, d2, d3:
>
> . scalar list d1 d2 d3
>         d1 =   478.0205
>         d2 =   478.0205
>         d3 =  478.02051
>
> The last time I checked (some years ago) this was not the case in SPSS.
> With non-integer weights SPSS yielded to different results for d1 than
> for d2 and d3. If I remember correctly, SPSS-aggregate and
> SPSS-descriptives seemed to use some kind of rounding for non-integer
> weights, although I did not found out what kind of rounding they used.
>
> Wonder whether this observation is still valid.
>
> Uli
>
>
>
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>


-- 

Amadou B. DIALLO, PhD.

Economist (Anti-Poverty Programs - DR Congo), AFTP3, The World Bank,
Washington DC.

Director, Center for Research and Training on Adult Health and Education.
Mayotte (FRANCE). www.aprosasoma.org
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index