[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Steichen, Thomas J." <SteichT@RJRT.com> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Anova and Contrasts with missing cells |

Date |
Fri, 24 Oct 2008 17:18:33 -0400 |

Listmembers, I have a question about contrasts after ANOVA for the following example dataset, which I also summarize via -table- below: input round size nnn nnn_adjm 1 600 .532 .532 1 600 .573 .573 1 600 .581 .581 2 600 .609 .609 2 600 .465 .465 2 600 .593 .593 3 400 .413 .5756667 3 400 .406 .5686666 3 400 .418 .5806667 3 800 .725 .5623333 3 800 .815 .6523333 3 800 .673 .5103333 4 600 .552 .552 4 600 .585 .585 4 600 .588 .588 4 600 .733 .733 4 600 .608 .608 5 600 .640 .640 5 600 .643 .643 5 600 .906 .906 5 600 .853 .853 5 600 .847 .847 end . table round size, c(mean nnn sd nnn n nnn) row col for(%7.3f) -------------------------------------- | size round | 400 600 800 Total ----------+--------------------------- 1 | 0.562 0.562 | 0.026 0.026 | 3 3 | 2 | 0.556 0.556 | 0.079 0.079 | 3 3 | 3 | 0.412 0.738 0.575 | 0.006 0.072 0.184 | 3 3 6 | 4 | 0.613 0.613 | 0.070 0.070 | 5 5 | 5 | 0.778 0.778 | 0.127 0.127 | 5 5 | Total | 0.412 0.644 0.738 0.625 | 0.006 0.125 0.072 0.142 | 3 16 3 22 -------------------------------------- The interesting feature of this dataset is that round 3 has data at two 'size' levels, that differ from the single 'size' used at all other rounds. It is also notable that the sample size for the data in rounds 4 and 5 differs from that in rounds 1, 2 and 3. If one does an ANOVA for the nnn data followed by contrasts, any contrast not involving round 3 seems reasonable; however, those involving 3 seem dubious. The examples below show the ANOVA and three example contrasts. . anova nnn round size|round Number of obs = 22 R-squared = 0.7465 Root MSE = .082094 Adj R-squared = 0.6673 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | .317523495 5 .063504699 9.42 0.0002 | round | .158760825 4 .039690206 5.89 0.0041 size|round | .15876267 1 .15876267 23.56 0.0002 | Residual | .107829606 16 .00673935 -----------+---------------------------------------------------- Total | .425353101 21 .02025491 . test _coef[round[1]] = _coef[round[2]] ( 1) round[1] - round[2] = 0 F( 1, 16) = 0.01 Prob > F = 0.9259 . test _coef[round[1]] = _coef[round[3]] ( 1) round[1] - round[3] = 0 F( 1, 16) = 6.87 Prob > F = 0.0185 . test _coef[round[1]] = _coef[round[4]] ( 1) round[1] - round[4] = 0 F( 1, 16) = 0.73 Prob > F = 0.4057 What is odd about this second contrast is that the means of rounds 1 and 3 differ by only 0.013 units (those in the first contrast differ by 0.006 and in the third by 0.051, with fairly similar sd's). So why is contrast 2 significant? (In fact, any contrast involving round 3 seems wrong.) To explore this further, I created a new variable nnn_adjm, where _adjm stands for adjusted mean. The adjustment is, for round 3 alone, to adjust the two 'size' subsets to have the same mean. (For other rounds, the values are retained as is.) In psuedo code, something like: gen nnn_adjm(i) = nnn(i) - mean(nnn|size(j)) + mean(nnn) That is, we subtract from each observation i the mean for its size subset j and add the grand mean (over both sizes). This, effectively, is what ANOVA does to account for the size effect. This gives us the following summary stats: . table round size, c(mean nnn_adjm sd nnn_adjm n nnn_adjm) row col for(%7.3f) -------------------------------------- | size round | 400 600 800 Total ----------+--------------------------- 1 | 0.562 0.562 | 0.026 0.026 | 3 3 | 2 | 0.556 0.556 | 0.079 0.079 | 3 3 | 3 | 0.575 0.575 0.575 | 0.006 0.072 0.046 | 3 3 6 | 4 | 0.613 0.613 | 0.070 0.070 | 5 5 | 5 | 0.778 0.778 | 0.127 0.127 | 5 5 | Total | 0.575 0.644 0.575 0.625 | 0.006 0.125 0.072 0.113 | 3 16 3 22 -------------------------------------- Note that the two size categories in round 3 now have the same mean but retain their sd's from before adjustment. Now, if we repeat the ANOVA and contrasts on this adjusted variable, we get: . anova nnn_adjm round size|round Number of obs = 22 R-squared = 0.5955 Root MSE = .082094 Adj R-squared = 0.4691 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | .158760831 5 .031752166 4.71 0.0078 | round | .158760831 4 .039690208 5.89 0.0041 size|round | 0 1 0 0.00 1.0000 | Residual | .107829606 16 .00673935 -----------+---------------------------------------------------- Total | .266590437 21 .012694783 . test _coef[round[1]] = _coef[round[2]] ( 1) round[1] - round[2] = 0 F( 1, 16) = 0.01 Prob > F = 0.9259 . test _coef[round[1]] = _coef[round[3]] ( 1) round[1] - round[3] = 0 F( 1, 16) = 0.04 Prob > F = 0.8487 . test _coef[round[1]] = _coef[round[4]] ( 1) round[1] - round[4] = 0 F( 1, 16) = 0.73 Prob > F = 0.4057 As expected, in the ANOVA the sum of squares for size|round is zero and the SS for round and residual are the same as before (less a little meaningless roundoff error). Likewise, contrasts not involving round 3 are identical to the unadjusted data, but the one involving round 3 has greatly changed (from p = 0.0185 to p = 0.8487). These adjusted results seem much more reasonable (as do any other contrasts involving round 3). If one compares these contrast results to what SAS or JMP produce, those not involving round 3 are identical to those of Stata. However, both SAS and JMP produce p = 0.8256 for the second contrast above. Generally, SAS and JMP produce p's for contrasts involving round 3 that are close, but different, to those produced by Stata using the 'adjusted' data above. Also, SAS and JMP produce identical results using the raw vs. adjusted data (whether round 3 is involved in the contrast or not). I will speculate that difference in answers is due to the unequal sample sizes and/or the cells with no data. But the question remains: which is correct? Tom ----------------------------------- Thomas J. Steichen steicht@rjrt.com ----------------------------------- CONFIDENTIALITY NOTE: This e-mail message, including any attachment(s), contains information that may be confidential, protected by the attorney-client or other legal privileges, and/or proprietary non-public information. If you are not an intended recipient of this message or an authorized assistant to an intended recipient, please notify the sender by replying to this message and then delete it from your system. Use, dissemination, distribution, or reproduction of this message and/or any of its attachments (if any) by unintended recipients is not authorized and may be unlawful. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: Get a probability response curve after probit/logit regression?** - Next by Date:
**Re: st: Re: problem with -artsurv-** - Previous by thread:
**st: problem with -artsurv-** - Next by thread:
**Re: st: Anova and Contrasts with missing cells** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |