Re: st: is this the correct statistical test to compare non-normally distributed count data

From   Nick Cox
To   "[email protected]" <[email protected]>
Subject   Re: st: is this the correct statistical test to compare non-normally distributed count data
Date   Wed, 22 Jan 2014

This could be argued several ways. One short summary is that you've
not told us enough about your data to allow really good advice.

If your variable is a count then in principle, there is an important
distinction: whether values could (much) exceed 9 or values could only
be in a limited set, 0(1)9 (or 0(1)10, or whatever). You called them
scores, so perhaps despite your word "count" they are really ordinal
grades and not defined by being counted.

If the distribution is (strongly) discrete, then it can't be normal
and -swilk- is from one point of view incorrect and irrelevant. It
could be approximately normal, however, other than the discreteness,
and many researchers would take the opposite point of view and swallow
the discreteness.

But the overall distribution is not quite the question. You fed all
the data to -swilk- but with two groups that's not the whole story.

All that said, it wouldn't surprise me if a t-test produced a P-value
loosely similar to your -ranksum- result. That's the way t-tests often
work; in many cases they don't depend that strongly on normality
(although outliers etc. can be problematic).

The dichotomy either something is normal, or we have to retreat to
nonparametric testing is (in my view) 1950s thinking. There is a whole
bundle of possible tests depending on what an appropriate distribution
is for your data.

Yet more: a t-test compares means. Is that your objective, comparing
means? If it's your objective then that question can't be answered by
-ranksum-, as -ranksum- says nothing about means. I have to wonder
whether your objective is comparing the distributions, in which case
you are going to learn most from a graphical comparison, not a
significance test.

On 21 January 2014 16:07, Gwinyai Masukume wrote:

> I have the variable a_score which can take the values 0, 1, 2 up to 9.
>  I have two groups and I want to compare if a_score is the same
> between the two groups. Since a_score is not normally distributed I
> have used a non-parametric test and the p-value shows that a_score is
> not significantly different between the two groups if p < 0.05 is
> considered significant.
> Have I used the correct test?
> Kind regards,
> Gwinyai
> . swilk a_score
>                    Shapiro-Wilk W test for normal data
>     Variable |    Obs       W           V         z       Prob>z
> -------------+--------------------------------------------------
>      a_score |   4610    0.99456     13.698     6.850    0.00000
> .
> . * non-parametric test
> . ranksum a_score, by(group)
> Two-sample Wilcoxon rank-sum (Mann-Whitney) test
>        group |      obs    rank sum    expected
> -------------+---------------------------------
>      Group 1 |     4504    10338974    10352444
>      Group 2 |       92    224932.5      211462
> -------------+---------------------------------
>     combined |     4596    10563906    10563906
> unadjusted variance   1.587e+08
> adjustment for ties  -2084329.2
>                      ----------
> adjusted variance     1.567e+08
> Ho: a_score(group==Group 1) = a_score(group==Group 2)
>              z =  -1.076
>     Prob > |z| =   0.2818
