Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <[email protected]> |

To |
"[email protected]" <[email protected]> |

Subject |
Re: st: is this the correct statistical test to compare non-normally distributed count data |

Date |
Wed, 22 Jan 2014 10:25:32 +0000 |

This could be argued several ways. One short summary is that you've not told us enough about your data to allow really good advice. If your variable is a count then in principle, there is an important distinction: whether values could (much) exceed 9 or values could only be in a limited set, 0(1)9 (or 0(1)10, or whatever). You called them scores, so perhaps despite your word "count" they are really ordinal grades and not defined by being counted. If the distribution is (strongly) discrete, then it can't be normal and -swilk- is from one point of view incorrect and irrelevant. It could be approximately normal, however, other than the discreteness, and many researchers would take the opposite point of view and swallow the discreteness. But the overall distribution is not quite the question. You fed all the data to -swilk- but with two groups that's not the whole story. All that said, it wouldn't surprise me if a t-test produced a P-value loosely similar to your -ranksum- result. That's the way t-tests often work; in many cases they don't depend that strongly on normality (although outliers etc. can be problematic). The dichotomy either something is normal, or we have to retreat to nonparametric testing is (in my view) 1950s thinking. There is a whole bundle of possible tests depending on what an appropriate distribution is for your data. Yet more: a t-test compares means. Is that your objective, comparing means? If it's your objective then that question can't be answered by -ranksum-, as -ranksum- says nothing about means. I have to wonder whether your objective is comparing the distributions, in which case you are going to learn most from a graphical comparison, not a significance test. Nick [email protected] On 21 January 2014 16:07, Gwinyai Masukume <[email protected]> wrote: > I have the variable a_score which can take the values 0, 1, 2 up to 9. > I have two groups and I want to compare if a_score is the same > between the two groups. Since a_score is not normally distributed I > have used a non-parametric test and the p-value shows that a_score is > not significantly different between the two groups if p < 0.05 is > considered significant. > > Have I used the correct test? > > Kind regards, > Gwinyai > > . swilk a_score > > Shapiro-Wilk W test for normal data > > Variable | Obs W V z Prob>z > -------------+-------------------------------------------------- > a_score | 4610 0.99456 13.698 6.850 0.00000 > > . > . * non-parametric test > . ranksum a_score, by(group) > > Two-sample Wilcoxon rank-sum (Mann-Whitney) test > > group | obs rank sum expected > -------------+--------------------------------- > Group 1 | 4504 10338974 10352444 > Group 2 | 92 224932.5 211462 > -------------+--------------------------------- > combined | 4596 10563906 10563906 > > unadjusted variance 1.587e+08 > adjustment for ties -2084329.2 > ---------- > adjusted variance 1.567e+08 > > Ho: a_score(group==Group 1) = a_score(group==Group 2) > z = -1.076 > Prob > |z| = 0.2818 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:

**References**:**st: is this the correct statistical test to compare non-normally distributed count data***From:*Gwinyai Masukume <[email protected]>

- Prev by Date:
**Re: st: Obtain p-values for GEE models after bootstrap** - Next by Date:
**st: information criterions after -xtreg, re-** - Previous by thread:
**st: is this the correct statistical test to compare non-normally distributed count data** - Next by thread:
**Re: st: is this the correct statistical test to compare non-normally distributed count data** - Index(es):