Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Wrong results for Wilcoxon signed ranks test when data have decimal places (even using double) |

Date |
Thu, 14 Feb 2013 18:35:29 +0000 |

On the contrary, I think your argument was mostly very clear; the only detail that was unclear to me was what rounding you were suggesting but the later post made that explicit. It's up to StataCorp to respond. I agree with you that you have exposed a problem. My own suggestion is that the problem should be documented via an FAQ, but it's manifestly not my decision. StataCorp certainly place a high priority on reproducing textbook examples and results from other software, and working at why results differ when they do. Nick On Thu, Feb 14, 2013 at 4:58 PM, Marta García-Granero <mgarciagranero@gmail.com> wrote: > Since English is not my native tongue, I did not express myself very well. > When I talked about rounding, I did not talk about rounding the original > data, but applying the round() function to the absolute differences before > ranking them. > > Time ago, while preparing some slides with Excel (just for classes, I NEVER > use Excel for serious research), I found the same problem: some differences > that should be the same where in fact different (below the 15th decimal > place) an got a wrong rank assigned. I discovered that ranking > "round(absdiff,1e-15)" eliminated the problem, since the data where compared > only up to th 15th decimal place and declared equal or different correctly. > In another message I have sent shortly before this one, I have suggested > applying the same method to signrank.ado fixed the problem with wrong > ranking (I tested it myself before posting). > > Concerning SPSS, since their code is compiled and hidden, and more protected > than Coke's formula, I can only guess from the Acrobat documentation and my > hand calculations that they have somehow circumvented the problem of those > nasty little differences below the 15th decimal place. > > Maybe I was a bit too bold (being just a 2 months old Stata user) suggesting > the modification of signrank.ado, but I am checking it with different > datasets (from statistics books), and the results obtained with Stata, SPSS, > and the ones shown in those books agree. > > Regards, > MGG > > El 14/02/2013 17:32, Nick Cox escribió: >> >> Surprising though it may seem in the face of this carefully presented >> evidence, I wouldn't call this a bug, at least not one that is >> fixable. >> >> It's an anomaly and it's awkward, but it's not a bug >> >> First off, a look at the code for -signrank- suggests that Stata uses >> -double- precision where possible, and that's as far as ado code goes. >> >> It's an anomaly and it's awkward, but if it were a bug there would be >> a solution and Marta's suggestion that there be "some rounding", >> whatever that means precisely, does not sound like a good solution, >> because how is StataCorp supposed to justify what rounding it does, >> and how does that fit in with anybody else's idea of what the correct >> procedure is, exactly and reproducibly? For example, which >> authoritative accounts say you should apply some rounding first to get >> reproducible results? >> >> Also, Marta has a solid argument that when you have a rank procedure, >> and data that come all presented to 2 decimal places, that you should >> get exactly the same result when data are multiplied by 100 and become >> integers. That's totally sound logic: the results of ranking are >> invariant under multiplication of the originals by a positive >> constant. But that's not only the only consideration. The other >> consideration is that people reasonably expect this test to be >> applicable to non-integer data and so Stata's code has to work within >> the constraints that implies. >> >> The underlying fact, often rehearsed on this list, is that Stata does >> not do, and does not claim to do, exact decimal arithmetic unless >> there is an exact binary equivalent of that decimal calculation. So >> the heart of the matter is that Stata will very occasionally give what >> look wrong answers to decimal problems, as in the case of >> >> . di %21x 0.70 - 0.65 >> +1.9999999999990X-005 >> >> . di %21x 0.65 - 0.6 >> +1.99999999999a0X-005 >> >> Every smart child knows that the answers to these problems should be >> same, but they aren't when mapped to the nearest equivalent problems >> in binary. >> >> I can't comment on exactly what SPSS does; that's clearly pertinent too. >> >> Nick >> >> On Thu, Feb 14, 2013 at 4:02 PM, Marta García-Granero >> <mgarciagranero@gmail.com> wrote: >>> >>> Apologies for sending this twice, but yesterday I tried to piggyback into >>> another thread ("Rounding Errors Stata 12"), although closely related to >>> this question, and I think my question got lost. Besides, I'm going to >>> explain the problem a bit more (and better). >>> >>> I'm converting some class notes (basic statistics) from SPSS to Stata, >>> and I >>> have found that the way Stata handles ranking tied data in Wilcoxon test >>> can >>> be sometimes wrong, when data have decimal places, even using -double- >>> everywhere. >>> >>> The sample dataset comes from the on-line e-book Statistics at Square One >>> (exercise at the end of chapter 1). I am using Stata 12.1 64 bits (last >>> update installed) on W7, but I found the same problem with Stata 12.1 32 >>> bits on Windows XP. The results I get using Stata doesn't match the ones, >>> I >>> got either with my hand calculations, or with SPSS. >>> >>> set type double >>> input copper >>> 0.70 >>> 0.45 >>> 0.72 >>> 0.30 >>> 1.16 >>> 0.69 >>> 0.83 >>> 0.74 >>> 1.24 >>> 0.77 >>> 0.65 >>> 0.76 >>> 0.42 >>> 0.94 >>> 0.36 >>> 0.98 >>> 0.64 >>> 0.90 >>> 0.63 >>> 0.55 >>> 0.78 >>> 0.10 >>> 0.52 >>> 0.42 >>> 0.58 >>> 0.62 >>> 1.12 >>> 0.86 >>> 0.74 >>> 1.04 >>> 0.65 >>> 0.66 >>> 0.81 >>> 0.48 >>> 0.85 >>> 0.75 >>> 0.73 >>> 0.50 >>> 0.34 >>> 0.88 >>> end >>> >>> * One sample Wilcoxon's test (against population median = 0.6) >>> >>> signrank copper = 0.6 >>> >>> * Multiply data by 100 to get rid of decimal places and running the test >>> again (pop. median = 60) >>> * this time all the output (positive&negative sum of ranks, Z stat&p >>> value) >>> is correct >>> >>> generate copper100 = round(copper*100) >>> signrank copper100 = 60 >>> >>> * Generating the ranks for absolute differences between copper & pop >>> median >>> for both variables (copper&copper100) >>> * Ranks should have been the same in both cases, but they are not >>> * Notice the difference for cases 5/6/7, 18/19, 22/23/24, 29/30, 32/33 >>> * "ranks2" is correct (recognizes all tied data), and leads to the right >>> Wilcoxon's p-value >>> >>> egen double ranks1 = rank(abs(copper-0.6)) >>> egen double ranks2 = rank(abs(copper100-60)) >>> generate absdiff = abs(copper-0.6) >>> sort absdiff >>> list absdiff ranks1 ranks2 >>> >>> I would label that as a Stata bug. Tied absolute differences are not >>> recognized as so because there is a difference at the 15th decimal place. >>> Maybe some rounding should be performed before assigning ranks. >>> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Wrong results for Wilcoxon signed ranks test when data have decimal places (even using double)***From:*Marta García-Granero <mgarciagranero@gmail.com>

**Re: st: Wrong results for Wilcoxon signed ranks test when data have decimal places (even using double)***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Wrong results for Wilcoxon signed ranks test when data have decimal places (even using double)***From:*Marta García-Granero <mgarciagranero@gmail.com>

- Prev by Date:
**Re: st: Calculating Percent Agreement** - Next by Date:
**st: Fwd: How to configure kate or gedit as stata editor in linux** - Previous by thread:
- Next by thread:
**st: interpreting marginal effects after ivpois** - Index(es):