Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Rounding Errors Stata 12


From   Marta García-Granero <mgarciagranero@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Rounding Errors Stata 12
Date   Wed, 13 Feb 2013 17:18:41 +0100

Talking about rounding errors, I have found what I think it is a bug in the way Stata manages sometimes tied differences before ranking them for Wilcoxon's signed ranks test.

The sample data comes from exercise 1, chapter 1 of "Statistics at Square One" (available as electronic resource here: http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/1-data-display-and-summary )

I used this example for many years in my classes, both with hand calculations and SPSS as statistical package (the one we had until recently at my University). When I use Stata instead to test if the population median is 0.6, I get different results:

. signrank cobre = 0.6

Wilcoxon signed-rank test

        sign |      obs   sum ranks    expected
-------------+---------------------------------
    positive |       28       591.5         410
    negative |       12       228.5         410
        zero |        0           0           0
-------------+---------------------------------
         all |       40         820         820

unadjusted variance     5535.00
adjustment for ties       -0.75
adjustment for zeros       0.00
                     ----------
adjusted variance       5534.25

Ho: cobre = 0.6
             z =   2.440
    Prob > |z| =   0.0147

SPSS (and I get the same result by hand)gives:

Ranks
                      N   Mean Rank  Sum of Ranks
    Negative Ranks    28    21.00    588.00
    Positive Ranks    12    19.33    232.00
    Zero               0
    Total             40

Test Statistics

Z    -2.393
Asymp. Sig. (2-tailed)    0.017

As you can see, the rank sum (and, therefore, the Z statistic) are different

After a bit of experimenting, I have found that Stata is handling tied differences involving opposite signs in a wrong way, but not systematically. The last column (rank~100) has the correct ranks, while ranked" contains the same values that Stata uses to get the positive and negative sum of ranks. Notice the difference for cases 5/6/7, 18/19, 22/23/24, 29/30, 32/33. In all cases, the wrong ranking involves differences with oppsotie signs, but this is not systematic (see cases 1/2, where the ties are recognized, or 11/12, 13/14...). I used "double" in all the generated variables to avoid the known float problems.

generate double difs = (cobre-0.6)
generate double absdifs = abs(cobre-0.6)
egen double ranked = rank(absdifs)
generate double absdifs100 = 100*abs(cobre-0.6)
egen double ranked100 = rank(abs(round(absdifs100)))
sort absdifs
list cobre difs ranked ranked100

     +----------------------------------+
     | cobre   difs   ranked   rank~100 |
     |----------------------------------|
  1. |   .58   -.02      1.5        1.5 |
  2. |   .62    .02      1.5        1.5 |
  3. |   .63    .03        3          3 |
  4. |   .64    .04        4          4 |
  5. |   .55   -.05        5          6 |
     |----------------------------------|
  6. |   .65    .05      6.5          6 |
  7. |   .65    .05      6.5          6 |
  8. |   .66    .06        8          8 |
  9. |   .52   -.08        9          9 |
 10. |   .69    .09       10         10 |
     |----------------------------------|
 11. |    .7     .1     11.5       11.5 |
 12. |    .5    -.1     11.5       11.5 |
 13. |   .48   -.12     13.5       13.5 |
 14. |   .72    .12     13.5       13.5 |
 15. |   .73    .13       15         15 |
     |----------------------------------|
 16. |   .74    .14     16.5       16.5 |
 17. |   .74    .14     16.5       16.5 |
 18. |   .45   -.15       18       18.5 |
 19. |   .75    .15       19       18.5 |
 20. |   .76    .16       20         20 |
     |----------------------------------|
 21. |   .77    .17       21         21 |
 22. |   .42   -.18     22.5         23 |
 23. |   .42   -.18     22.5         23 |
 24. |   .78    .18       24         23 |
 25. |   .81    .21       25         25 |
     |----------------------------------|
 26. |   .83    .23       26         26 |
 27. |   .36   -.24       27         27 |
 28. |   .85    .25       28         28 |
 29. |   .34   -.26       29       29.5 |
 30. |   .86    .26       30       29.5 |
     |----------------------------------|
 31. |   .88    .28       31         31 |
 32. |    .3    -.3       32       32.5 |
 33. |    .9     .3       33       32.5 |
 34. |   .94    .34       34         34 |
 35. |   .98    .38       35         35 |
     |----------------------------------|
 36. |  1.04    .44       36         36 |
 37. |    .1    -.5       37         37 |
 38. |  1.12    .52       38         38 |
 39. |  1.16    .56       39         39 |
 40. |  1.24    .64       40         40 |
     +----------------------------------+

I must say that this is the only example where I found differences between SPSS & Stata's output.

Regards,
Prof. Mart Garcia-Granero, PhD
Department of Biochemistry and Genetics
University of Navarra
SPAIN.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index