"K Jensen" <k.x.jensen@googlemail.com>

statalist <statalist@hsphsun2.harvard.edu>

st: Non-standard categorical data test - help!

Wed, 10 Dec 2008 12:09:00 +0000

I have data from five different populations that looks like this:- +--------------------+ | pop a b n | |--------------------| 1. | 1 0 0 325 | 2. | 1 0 1 77 | 3. | 1 1 0 59 | 4. | 1 1 1 9 | 5. | 2 0 0 788 | |--------------------| 6. | 2 0 1 262 | 7. | 2 1 0 99 | 8. | 2 1 1 28 | 9. | 3 0 0 270 | 10. | 3 0 1 91 | |--------------------| 11. | 3 1 0 40 | 12. | 3 1 1 6 | 13. | 4 0 0 311 | 14. | 4 0 1 84 | 15. | 4 1 0 35 | |--------------------| 16. | 4 1 1 9 | 17. | 5 0 0 281 | 18. | 5 0 1 85 | 19. | 5 1 0 28 | 20. | 5 1 1 5 | |--------------------| where each population has counts of the # of observations (n) in of the four categories created by the possible values of two factors A and B. I would like to test the a priori hypothesis that there should be fewer than expected observations with both A=1 and B=1 than if A and B were independent, ie :- Ho: p(A=1,B=1)_i = p(A=1)_i * p(B=1)_i for all i, i=1,5 versus H1: p(A=1,B=1)_i < p(A=1)_i * p(B=1)_i ... to get a single p-value I don't think you can aggregate this into one big chi-square test with the observations: 1975 261 599 57 because there is no physical reason to expect p(A) and p(B) to be the same for all i, and indeed it looks as though they are different Also, is that a more general test of non-independence rather than specifically looking at directional departure at A=1 and B=1? I tried doing this as what I have seen described as a "Replicated G-test of independence" (I had to do this in Excel - if anyone knows how to do it in Stata, that would be really useful for future projects) and got the following results:- Pop G df p 1 1.46 1 0.2271 2 0.53 1 0.4681 3 3.74 1 0.0533 4 0.02 1 0.9002 5 1.23 1 0.2679 Total G 6.96 5 0.2233 Pooled G 4.84 1 0.0278 Heterogeneity G 2.12 4 0.0000 i.e. the heterogeneity G suggests that you can't pool the results like this Is there any way in Stata to test my hypothesis and get a single p-value? Thankyou Karin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

