Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# Re: st: Rounding Errors Stata 12

 From Marta García-Granero To statalist@hsphsun2.harvard.edu Subject Re: st: Rounding Errors Stata 12 Date Wed, 13 Feb 2013 17:18:41 +0100

Talking about rounding errors, I have found what I think it is a bug in the way Stata manages sometimes tied differences before ranking them for Wilcoxon's signed ranks test.
```
```
The sample data comes from exercise 1, chapter 1 of "Statistics at Square One" (available as electronic resource here: http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/1-data-display-and-summary )
```
```
I used this example for many years in my classes, both with hand calculations and SPSS as statistical package (the one we had until recently at my University). When I use Stata instead to test if the population median is 0.6, I get different results:
```
. signrank cobre = 0.6

Wilcoxon signed-rank test

sign |      obs   sum ranks    expected
-------------+---------------------------------
positive |       28       591.5         410
negative |       12       228.5         410
zero |        0           0           0
-------------+---------------------------------
all |       40         820         820

unadjusted variance     5535.00
adjustment for ties       -0.75
adjustment for zeros       0.00
----------
adjusted variance       5534.25

Ho: cobre = 0.6
z =   2.440
Prob > |z| =   0.0147

SPSS (and I get the same result by hand)gives:

Ranks
N   Mean Rank  Sum of Ranks
Negative Ranks    28    21.00    588.00
Positive Ranks    12    19.33    232.00
Zero               0
Total             40

Test Statistics

Z    -2.393
Asymp. Sig. (2-tailed)    0.017

As you can see, the rank sum (and, therefore, the Z statistic) are different

```
After a bit of experimenting, I have found that Stata is handling tied differences involving opposite signs in a wrong way, but not systematically. The last column (rank~100) has the correct ranks, while ranked" contains the same values that Stata uses to get the positive and negative sum of ranks. Notice the difference for cases 5/6/7, 18/19, 22/23/24, 29/30, 32/33. In all cases, the wrong ranking involves differences with oppsotie signs, but this is not systematic (see cases 1/2, where the ties are recognized, or 11/12, 13/14...). I used "double" in all the generated variables to avoid the known float problems.
```
generate double difs = (cobre-0.6)
generate double absdifs = abs(cobre-0.6)
egen double ranked = rank(absdifs)
generate double absdifs100 = 100*abs(cobre-0.6)
egen double ranked100 = rank(abs(round(absdifs100)))
sort absdifs
list cobre difs ranked ranked100

+----------------------------------+
| cobre   difs   ranked   rank~100 |
|----------------------------------|
1. |   .58   -.02      1.5        1.5 |
2. |   .62    .02      1.5        1.5 |
3. |   .63    .03        3          3 |
4. |   .64    .04        4          4 |
5. |   .55   -.05        5          6 |
|----------------------------------|
6. |   .65    .05      6.5          6 |
7. |   .65    .05      6.5          6 |
8. |   .66    .06        8          8 |
9. |   .52   -.08        9          9 |
10. |   .69    .09       10         10 |
|----------------------------------|
11. |    .7     .1     11.5       11.5 |
12. |    .5    -.1     11.5       11.5 |
13. |   .48   -.12     13.5       13.5 |
14. |   .72    .12     13.5       13.5 |
15. |   .73    .13       15         15 |
|----------------------------------|
16. |   .74    .14     16.5       16.5 |
17. |   .74    .14     16.5       16.5 |
18. |   .45   -.15       18       18.5 |
19. |   .75    .15       19       18.5 |
20. |   .76    .16       20         20 |
|----------------------------------|
21. |   .77    .17       21         21 |
22. |   .42   -.18     22.5         23 |
23. |   .42   -.18     22.5         23 |
24. |   .78    .18       24         23 |
25. |   .81    .21       25         25 |
|----------------------------------|
26. |   .83    .23       26         26 |
27. |   .36   -.24       27         27 |
28. |   .85    .25       28         28 |
29. |   .34   -.26       29       29.5 |
30. |   .86    .26       30       29.5 |
|----------------------------------|
31. |   .88    .28       31         31 |
32. |    .3    -.3       32       32.5 |
33. |    .9     .3       33       32.5 |
34. |   .94    .34       34         34 |
35. |   .98    .38       35         35 |
|----------------------------------|
36. |  1.04    .44       36         36 |
37. |    .1    -.5       37         37 |
38. |  1.12    .52       38         38 |
39. |  1.16    .56       39         39 |
40. |  1.24    .64       40         40 |
+----------------------------------+

```
I must say that this is the only example where I found differences between SPSS & Stata's output.
```
Regards,
Prof. Mart Garcia-Granero, PhD
Department of Biochemistry and Genetics
University of Navarra
SPAIN.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index