ANOVA / ANCOVA
- Balanced and unbalanced designs
- Missing cells
- Factorial, nested, and mixed designs
- Repeated measures
- Box, Greenhouse–Geisser, and Huynh–Feldt corrections
Afifi and Azen (1979) fitted a model of the change in systolic blood
pressure for 58 patients, each suffering from one of three diseases, who
were randomly assigned one of four different drug treatments:
. webuse systolic
(Systolic Blood Pressure Data)
. anova systolic drug disease drug*disease
Number of obs = 58 R-squared = 0.4560
Root MSE = 10.5096 Adj R-squared = 0.3259
Source | Partial SS df MS F Prob > F
-------------+----------------------------------------------------
Model | 4259.33851 11 387.212591 3.51 0.0013
|
drug | 2997.47186 3 999.157287 9.05 0.0001
disease | 415.873046 2 207.936523 1.88 0.1637
drug*disease | 707.266259 6 117.87771 1.07 0.3958
|
Residual | 5080.81667 46 110.452536
-------------+----------------------------------------------------
Total | 9340.15517 57 163.862371
An important feature of Stata is that it does not have modes or modules.
You do not enter the ANOVA module to fit an ANOVA model, but you simply type
the command. The advantage in this is that Stata’s other commands can be
interspersed to help you better understand these data. For instance, the
data here are almost balanced, as revealed by Stata's table command:
. table drug disease, col row
--------------------------------------
| Patient's Disease
Drug Used | 1 2 3 Total
----------+---------------------------
1 | 6 4 5 15
2 | 5 4 6 15
3 | 3 5 4 12
4 | 5 6 5 16
|
Total | 19 19 20 58
--------------------------------------
table can also be used to help you better understand the relationship
of the increase in blood pressure by drug and disease:
. table drug disease, col row c(mean systolic) f(%8.2f)
--------------------------------------
| Patient's Disease
Drug Used | 1 2 3 Total
----------+---------------------------
1 | 29.33 28.25 20.40 26.07
2 | 28.00 33.50 18.17 25.53
3 | 16.33 4.40 8.50 8.75
4 | 13.60 12.83 14.20 13.50
|
Total | 22.79 18.21 15.80 18.88
--------------------------------------
In the estimates above, the direct effect of disease was found to be
insignificant, as was the drug*disease interaction. We might now
compare our two-way factorial model with a simpler, one-way layout:
. test disease drug*disease
Source | Partial SS df MS F Prob > F
---------------------+----------------------------------------------------
disease drug*disease | 1126.10 8 140.7625 1.27 0.2801
Residual | 5080.81667 46 110.452536
test can still access the estimates, even though two tabulations have
intervened. Similarly, anova is integrated with Stata's
regress command for estimating linear regressions. We can review the
underlying regression estimates by typing regress without arguments:
. regress
Source | SS df MS Number of obs = 58
-------------+------------------------------ F( 11, 46) = 3.51
Model | 4259.33851 11 387.212591 Prob > F = 0.0013
Residual | 5080.81667 46 110.452536 R-squared = 0.4560
-------------+------------------------------ Adj R-squared = 0.3259
Total | 9340.15517 57 163.862371 Root MSE = 10.51
------------------------------------------------------------------------------
systolic Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 14.2 4.700054 3.02 0.004 4.739282 23.66072
drug
1 6.2 6.64688 0.93 0.356 -7.179475 19.57948
2 3.966667 6.363903 0.62 0.536 -8.843206 16.77654
3 -5.7 7.050081 -0.81 0.423 -19.89108 8.491077
4 (dropped)
disease
1 -.6 6.64688 -0.09 0.928 -13.97948 12.77948
2 -1.366667 6.363903 -0.21 0.831 -14.17654 11.44321
3 (dropped)
drug*disease
1 1 9.533333 9.202189 1.04 0.306 -8.989712 28.05638
1 2 9.216667 9.497521 0.97 0.337 -9.900851 28.33418
1 3 (dropped)
2 1 10.43333 9.202189 1.13 0.263 -8.089712 28.95638
2 2 16.7 9.301675 1.80 0.079 -2.0233 35.4233
2 3 (dropped)
3 1 8.433333 10.42169 0.81 0.423 -12.54444 29.41111
3 2 -2.733333 9.497521 -0.29 0.775 -21.85085 16.38418
3 3 (dropped)
4 1 (dropped)
4 2 (dropped)
4 3 (dropped)
------------------------------------------------------------------------------
With our previous test command, we found that a one-way model
fits these data well. We could use either Stata's anova command or
Stata’s oneway command to fit a one-way model.
. oneway systolic drug, bonferroni
Analysis of Variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 3133.23851 3 1044.41284 9.09 0.0001
Within groups 6206.91667 54 114.942901
------------------------------------------------------------------------
Total 9340.15517 57 163.862371
Bartlett's test for equal variances: chi2(3) = 1.0063 Prob>chi2 = 0.800
Comparison of Increment in Systolic B.P. by Drug Used
(Bonferroni)
Row Mean-|
Col Mean | 1 2 3
---------+---------------------------------
2 | -.533333
| 1.000
|
3 | -17.3167 -16.7833
| 0.001 0.001
|
4 | -12.5667 -12.0333 4.75
| 0.012 0.017 1.000
Table 7.7 of Winer, Brown, and Michels (1991) provides a repeated-measures
ANOVA example involving both nested and crossed terms. There are four dial
shapes and two methods for calibrating dials. Subjects are nested within
the calibration method, and an accuracy score is obtained. Here is the
Stata anova command for this problem.
. webuse t77
(T7.7 -- Winer, Brown, Michels)
. anova score calib / subject|calib shape calib*shape , repeated(shape)
Number of obs = 24 R-squared = 0.8925
Root MSE = 1.11181 Adj R-squared = 0.7939
Source | Partial SS df MS F Prob > F
--------------+----------------------------------------------------
Model | 123.125 11 11.1931818 9.06 0.0003
|
calib | 51.0416667 1 51.0416667 11.89 0.0261
subject|calib | 17.1666667 4 4.29166667
--------------+----------------------------------------------------
shape | 47.4583333 3 15.8194444 12.80 0.0005
calib*shape | 7.45833333 3 2.48611111 2.01 0.1662
|
Residual | 14.8333333 12 1.23611111
--------------+----------------------------------------------------
Total | 137.958333 23 5.99818841
Between-subjects error term: subject|calib
Levels: 6 (4 df)
Lowest b.s.e. variable: subject
Covariance pooled over: calib (for repeated variable)
Repeated variable: shape
Huynh-Feldt epsilon = 0.8483
Greenhouse-Geisser epsilon = 0.4751
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
--------------+----------------------------------------------------
shape | 3 12.80 0.0005 0.0011 0.0099 0.0232
calib*shape | 3 2.01 0.1662 0.1791 0.2152 0.2291
Residual | 12
--------------+----------------------------------------------------
See
New in Stata 10
for more about what was added in Stata Release 10.
References
- Afifi, A. A., and S. P. Azen. 1979.
- Statistical Analysis: A computer-oriented approach. 2nd ed.
New York: Academic Press.
- Winer, B. J., R. Brown, and K. M. Michels. 1991.
- Statistical Principles
in Experimental Design. 3rd ed. New York: McGraw–Hill.
|