Search
   >> Home >> Products >> Features >> Overview >> ANOVA / ANCOVA

ANOVA / ANCOVA

  • Balanced and unbalanced designs
  • Missing cells
  • Factorial, nested, and mixed designs
  • Repeated measures
  • Box, Greenhouse–Geisser, and Huynh–Feldt corrections

Afifi and Azen (1979) fitted a model of the change in systolic blood pressure for 58 patients, each suffering from one of three diseases, who were randomly assigned one of four different drug treatments:

. webuse systolic (Systolic Blood Pressure Data) . anova systolic drug disease drug#disease Number of obs = 58 R-squared = 0.4560 Root MSE = 10.5096 Adj R-squared = 0.3259
Source Partial SS df MS F Prob > F
Model 4259.33851 11 387.212591 3.51 0.0013
drug 2997.47186 3 999.157287 9.05 0.0001
disease 415.873046 2 207.936523 1.88 0.1637
drug#disease 707.266259 6 117.87771 1.07 0.3958
Residual 5080.81667 46 110.452536
Total 9340.15517 57 163.862371

An important feature of Stata is that it does not have modes or modules. You do not enter the ANOVA module to fit an ANOVA model, but you simply type the command. The advantage in this is that Stata’s other commands can be interspersed to help you better understand these data. For instance, the data here are almost balanced, as revealed by Stata's table command:

. table drug disease, col row
Patient's Disease
Drug Used 1 2 3 Total
1 6 4 5 15
2 5 4 6 15
3 3 5 4 12
4 5 6 5 16
Total 19 19 20 58

table can also be used to help you better understand the relationship of the increase in blood pressure by drug and disease:

. table drug disease, col row c(mean systolic) f(%8.2f)
Patient's Disease
Drug Used 1 2 3 Total
1 29.33 28.25 20.40 26.07
2 28.00 33.50 18.17 25.53
3 16.33 4.40 8.50 8.75
4 13.60 12.83 14.20 13.50
Total 22.79 18.21 15.80 18.88

The test command allows you to perform tests directly on the coefficients of the underlying regression model. For instance, we can test if the coefficient on the third drug is equal to the coefficient on the fourth.

. test 3.drug = 4.drug ( 1) 3.drug - 4.drug = 0 F( 1, 46) = 0.13 Prob > F = 0.7234

We find that the two coefficients are not significantly different, at least at any significance level smaller than 73%.

For more complex tests, the contrast command often provides a more concise way to specify the test we are interested in and prevents us from having to write tests in terms of the regression coefficients. With contrast, we instead specify our tests in terms of differences in the marginal means for the levels of a particular factor. For example, if we want to compare the third and fourth drugs, we can test the difference in the mean impact on systolic blood pressure separately for each disease using the @ operator. We also use the reverse adjacent operator, ar., to compare the fourth level of the drug with the previous level.

. contrast ar4.drug@disease Contrasts of marginal linear predictions Margins : asbalanced
df F P>F
drug@disease
(4 vs 3) 1 1 0.13 0.7234
(4 vs 3) 2 1 1.76 0.1917
(4 vs 3) 3 1 0.65 0.4230
Joint 3 0.85 0.4761
Denominator 46
Contrast Std. Err. [95% Conf. Interval]
drug@disease
(4 vs 3) 1 -2.733333 7.675156 -18.18262 12.71595
(4 vs 3) 2 8.433333 6.363903 -4.376539 21.24321
(4 vs 3) 3 5.7 7.050081 -8.491077 19.89108

test and contrast can still access the estimates, even though two tabulations have intervened. Similarly, anova is integrated with Stata’s regress command for estimating linear regressions. We can review the underlying regression estimates by typing regress without arguments:

. regress
Source SS df MS Number of obs = 58
F( 11, 46) = 3.51
Model 4259.33851 11 387.212591 Prob > F = 0.0013
Residual 5080.81667 46 110.452536 R-squared = 0.4560
Adj R-squared = 0.3259
Total 9340.15517 57 163.862371 Root MSE = 10.51
systolic Coef. Std. Err. t P>t| [95% Conf. Interval]
drug
2 -1.333333 6.363903 -0.21 0.835 -14.14321 11.47654
3 -13 7.431438 -1.75 0.087 -27.95871 1.958708
4 -15.73333 6.363903 -2.47 0.017 -28.54321 -2.923461
disease
2 -1.083333 6.783944 -0.16 0.874 -14.7387 12.57204
3 -8.933333 6.363903 -1.40 0.167 -21.74321 3.876539
drug#disease
2 2 6.583333 9.783943 0.67 0.504 -13.11072 26.27739
2 3 -.9 8.999918 -0.10 0.921 -19.0159 17.2159
3 2 -10.85 10.24353 -1.06 0.295 -31.46916 9.769157
3 3 1.1 10.24353 0.11 0.915 -19.51916 21.71916
4 2 .3166667 9.301675 0.03 0.973 -18.40663 19.03997
4 3 9.533333 9.202189 1.04 0.306 -8.989712 28.05638
_cons 29.33333 4.290543 6.84 0.000 20.69692 37.96975

In our original estimation, the direct effect of disease was found to be insignificant, as was the drug#disease interaction. We might now compare our two-way factorial model with a simpler, one-way layout:

. test disease drug#disease
Source Partial SS df MS F Prob > F
disease drug#disease 1126.1 8 140.7625 1.27 0.2801
Residual 5080.81667 46 110.452536

With the test command above, we found that a one-way model fits these data well. We could use either Stata's anova command or Stata’s oneway command to fit a one-way model.

. oneway systolic drug, bonferroni
Analysis of Variance
Source SS df MS F Prob > F
Between groups 3133.23851 3 1044.41284 9.09 0.0001
Within groups 6206.91667 54 114.942901
Total 9340.15517 57 163.862371
Bartlett's test for equal variances: chi2(3) = 1.0063 Prob>chi2 = 0.800
Comparison of Increment in Systolic B.P. by Drug Used
(Bonferroni)
Row Mean-
Col Mean 1 2 3
2 -.533333
1.000
3 -17.3167 -16.7833
0.001 0.001
4 -12.5667 -12.0333 4.75
0.012 0.017 1.000

Table 7.7 of Winer, Brown, and Michels (1991) provides a repeated-measures ANOVA example involving both nested and crossed terms. There are four dial shapes and two methods for calibrating dials. Subjects are nested within the calibration method, and an accuracy score is obtained.

Here is the Stata anova command for this problem.

. webuse t77 (T7.7 -- Winer, Brown, Michels) . anova score calib / subject|calib shape calib#shape , repeated(shape) Number of obs = 24 R-squared = 0.8925 Root MSE = 1.11181 Adj R-squared = 0.7939
Source Partial SS df MS F Prob > F
Model 123.125 11 11.1931818 9.06 0.0003
calib 51.0416667 1 51.0416667 11.89 0.0261
subject|calib 17.1666667 4 4.29166667
shape 47.4583333 3 15.8194444 12.80 0.0005
calib#shape 7.45833333 3 2.48611111 2.01 0.1662
Residual 14.8333333 12 1.23611111
Total 137.958333 23 5.99818841
Between-subjects error term: subject|calib Levels: 6 (4 df) Lowest b.s.e. variable: subject Covariance pooled over: calib (for repeated variable) Repeated variable: shape Huynh-Feldt epsilon = 0.8483 Greenhouse-Geisser epsilon = 0.4751 Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source df F Regular H-F G-G Box
shape 3 12.80 0.0005 0.0011 0.0099 0.0232
calib#shape 3 2.01 0.1662 0.1791 0.2152 0.2291
Residual 12

References

Afifi, A. A., and S. P. Azen. 1979.
Statistical Analysis: A computer-oriented approach. 2nd ed. New York: Academic Press.
Winer, B. J., R. Brown, and K. M. Michels. 1991.
Statistical Principles in Experimental Design. 3rd ed. New York: McGraw–Hill.
The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube