Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: comparing count-data distributions

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: comparing count-data distributions
Date   Wed, 25 Feb 2004 11:17:43 -0000

I'm forwarding this on behalf of Toby Robertson. 

(It still looks chi-square to me.) 

[email protected] 

-----Original Message-----
From: tob_123 [mailto:[email protected]]

Two species of butterfly, red and yellow, are observed in different 
numbers at four sites, A to D, on a single occasion. Is there a 
statistically significant difference between the species in the 
distribution of each across the four sites?

Let's assume (for now) that the probability of each individual 
attending a given site is independent of other individuals - i.e. 
there is no flocking or clustering effect. And let Pr(A) and Py(A) 
be the respective probabilities of a red or yellow butterfly 
attending Site A (and so on). We want to test the null hypothesis 

Pr(A) = Py(A), Pr(B) = Py(B), Pr(C) = Py(C)

against the alternative that at least one of these equalities is 
untrue. (The relationship between Pr(D) and Py(D) follows from the 
restriction that Pr() and Py() each sum to one.) 

The data consist of a count of each species at each site. The totals 
are not necessarily equal, i.e. there may be more or fewer reds than 
yellows overall.

If we were looking at the distribution of one or other species (for 
example, the hypothesis that individuals frequent each site with 
equal probability) a chi-squared test would be appropriate. But here 
we are comparing the distributions of the two species, without 
hypothesising what the probabilities associated with those 
distributions might be.

What is the form of the test? And how is it implemented in Stata?

Toby Robertson
Sofia, Bulgaria

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index