# st: Non-standard categorical data test - help!

 From "K Jensen" To statalist Subject st: Non-standard categorical data test - help! Date Wed, 10 Dec 2008 12:09:00 +0000

```I have data from five different populations that looks like this:-
+--------------------+
| pop   a   b      n |
|--------------------|
1. |   1   0   0    325 |
2. |   1   0   1     77 |
3. |   1   1   0     59 |
4. |   1   1   1      9 |
5. |   2   0   0    788 |
|--------------------|
6. |   2   0   1    262 |
7. |   2   1   0     99 |
8. |   2   1   1     28 |
9. |   3   0   0    270 |
10. |   3   0   1     91 |
|--------------------|
11. |   3   1   0     40 |
12. |   3   1   1      6 |
13. |   4   0   0    311 |
14. |   4   0   1     84 |
15. |   4   1   0     35 |
|--------------------|
16. |   4   1   1      9 |
17. |   5   0   0    281 |
18. |   5   0   1     85 |
19. |   5   1   0     28 |
20. |   5   1   1      5 |
|--------------------|

where each population has counts of the # of observations (n) in of
the four categories created by the possible values of two factors A
and B.

I would like to test the a priori hypothesis that there should be
fewer than expected observations with both A=1 and B=1 than if A and B
were independent, ie :-
Ho: p(A=1,B=1)_i = p(A=1)_i * p(B=1)_i for all i, i=1,5   versus
H1: p(A=1,B=1)_i < p(A=1)_i * p(B=1)_i
... to get a single p-value

I don't think you can aggregate this into one big chi-square test with
the observations:
1975  261
599 57
because there is no physical reason to expect p(A) and p(B) to be the
same for all i, and indeed it looks as though they are different

Also, is that a more general test of non-independence rather than
specifically looking at directional departure at A=1 and B=1?

I tried doing this as what I have seen described as a "Replicated
G-test of independence" (I had to do this in Excel - if anyone knows
how to do it in Stata, that would be really useful for future
projects) and got the following results:-
Pop	G	df	p
1	1.46	1	0.2271
2	0.53	1	0.4681
3	3.74	1	0.0533
4	0.02	1	0.9002
5	1.23	1	0.2679
Total G	6.96	5	0.2233
Pooled G	4.84	1	0.0278
Heterogeneity G	2.12	4	0.0000
i.e. the heterogeneity G suggests that you can't pool the results like this

Is there any way in Stata to test my hypothesis and get a single p-value?

Thankyou
Karin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```