# st: comparing count-data distributions

 From "Nick Cox" To Subject st: comparing count-data distributions Date Wed, 25 Feb 2004 11:17:43 -0000

```I'm forwarding this on behalf of Toby Robertson.

(It still looks chi-square to me.)

Nick
n.j.cox@durham.ac.uk

-----Original Message-----
From: tob_123 [mailto:tob_123@yahoo.co.uk]

Two species of butterfly, red and yellow, are observed in different
numbers at four sites, A to D, on a single occasion. Is there a
statistically significant difference between the species in the
distribution of each across the four sites?

Let's assume (for now) that the probability of each individual
attending a given site is independent of other individuals - i.e.
there is no flocking or clustering effect. And let Pr(A) and Py(A)
be the respective probabilities of a red or yellow butterfly
attending Site A (and so on). We want to test the null hypothesis
that:

Pr(A) = Py(A), Pr(B) = Py(B), Pr(C) = Py(C)

against the alternative that at least one of these equalities is
untrue. (The relationship between Pr(D) and Py(D) follows from the
restriction that Pr() and Py() each sum to one.)

The data consist of a count of each species at each site. The totals
are not necessarily equal, i.e. there may be more or fewer reds than
yellows overall.

If we were looking at the distribution of one or other species (for
example, the hypothesis that individuals frequent each site with
equal probability) a chi-squared test would be appropriate. But here
we are comparing the distributions of the two species, without
hypothesising what the probabilities associated with those
distributions might be.

What is the form of the test? And how is it implemented in Stata?

Toby Robertson
Sofia, Bulgaria

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```