ANOVA / ANCOVA

Order

<- See Stata's other features

Balanced and unbalanced designs
Missing cells
Factorial, nested, and mixed designs
Repeated measures
Box, Greenhouse–Geisser, and Huynh–Feldt corrections

Afifi and Azen (1979) fitted a model of the change in systolic blood pressure for 58 patients, each suffering from one of three diseases, who were randomly assigned one of four different drug treatments:

. webuse systolic
(Systolic Blood Pressure Data)

. anova systolic drug disease drug#disease

                           Number of obs =      58     R-squared     =  0.4560
                           Root MSE      = 10.5096     Adj R-squared =  0.3259


                  Source     Partial SS    df       MS           F     Prob > F
               
                   Model     4259.3385    11  387.21259       3.51     0.0013
                          
                    drug     2997.4719     3  999.15729       9.05     0.0001
                 disease     415.87305     2  207.93652       1.88     0.1637
            drug#disease     707.26626     6  117.87771       1.07     0.3958
                         
                Residual     5080.8167    46  110.45254   
               
                   Total     9340.1552    57  163.86237

An important feature of Stata is that it does not have modes or modules. You do not enter the ANOVA module to fit an ANOVA model. The advantage in this is that all Stata’s features can be interspersed to help you better understand these data. For instance, the data here are almost balanced, as revealed by Stata's table:

. table drug disease




                Patient's Disease
                1      2      3  Total
   
Drug used  

  1             6      4      5     15
  2             5      4      6     15
  3             3      5      4     12
  4             5      6      5     16
  Total        19     19     20     58

table can also be used to help you better understand the relationship of the increase in blood pressure by drug and disease:

.table drug disease, statistic(mean systolic) nformat(%8.2f) style(table-right)




                Patient's Disease
                1      2      3  Total
   
Drug used  

  1         29.33  28.25  20.40  26.07
  2         28.00  33.50  18.17  25.53
  3         16.33   4.40   8.50   8.75
  4         13.60  12.83  14.20  13.50
           
    Total   22.79  18.21  15.80  18.88

Stata's test allows you to perform tests directly on the coefficients of the underlying regression model. For instance, we can test if the coefficient on the third drug is equal to the coefficient on the fourth.

. test 3.drug = 4.drug

 ( 1)  3.drug - 4.drug = 0

       F(  1,    46) =    0.13
            Prob > F =    0.7234

We find that the two coefficients are not significantly different, at least at any significance level smaller than 73%.

For more complex tests, contrast often provides a more concise way to specify the test we are interested in and prevents us from having to write tests in terms of the regression coefficients. With contrast, we instead specify our tests in terms of differences in the marginal means for the levels of a particular factor. For example, if we want to compare the third and fourth drugs, we can test the difference in the mean impact on systolic blood pressure separately for each disease using the @ operator. We also use the reverse adjacent operator, ar., to compare the fourth level of the drug with the previous level.

. contrast ar4.drug@disease

Contrasts of marginal linear predictions

Margins      : asbalanced




                       df           F        P>F
   
drug@disease  
 (4 vs 3) 1            1        0.13     0.7234
 (4 vs 3) 2            1        1.76     0.1917
 (4 vs 3) 3            1        0.65     0.4230
      Joint            3        0.85     0.4761
            
 Denominator            46






                 Contrast   Std. Err.     [95% Conf. Interval]
   
drug@disease   
 (4 vs 3) 1      -2.733333   7.675156     -18.18262    12.71595
 (4 vs 3) 2       8.433333   6.363903     -4.376539    21.24321
 (4 vs 3) 3            5.7   7.050081     -8.491077    19.89108

test and contrast can still access the estimates, even though two tabulations have intervened. Similarly, anova is integrated with Stata’s regress for estimating linear regressions. We can review the underlying regression estimates by typing regress without arguments:

. regress



      Source         SS       df       MS            Number of obs =      58

              F(  11,    46) =   3.51

       Model    4259.33851    11   387.212591            Prob > F      =  0.0013

    Residual    5080.81667    46   110.452536            R-squared     =  0.4560

              Adj R-squared =  0.3259

       Total   9340.15517    57   163.862371            Root MSE      =   10.51







    systolic         Coef.   Std. Err.      t    P>t|     [95% Conf. Interval]
   
        drug   
          2      -1.333333   6.363903    -0.21   0.835    -14.14321    11.47654
          3            -13   7.431438    -1.75   0.087    -27.95871    1.958708
          4      -15.73333   6.363903    -2.47   0.017    -28.54321   -2.923461
            
     disease   
          2      -1.083333   6.783944    -0.16   0.874     -14.7387    12.57204
          3      -8.933333   6.363903    -1.40   0.167    -21.74321    3.876539
              
drug#disease   
        2 2       6.583333   9.783943     0.67   0.504    -13.11072    26.27739
        2 3            -.9   8.999918    -0.10   0.921     -19.0159     17.2159
        3 2         -10.85   10.24353    -1.06   0.295    -31.46916    9.769157
        3 3            1.1   10.24353     0.11   0.915    -19.51916    21.71916
        4 2       .3166667   9.301675     0.03   0.973    -18.40663    19.03997
        4 3       9.533333   9.202189     1.04   0.306    -8.989712    28.05638
              
       _cons      29.33333   4.290543     6.84   0.000     20.69692    37.96975

In our original estimation, the direct effect of disease was found to be insignificant, as was the drug#disease interaction. We might now compare our two-way factorial model with a simpler, one-way layout:

. test disease drug#disease



                  Source     Partial SS    df       MS           F     Prob > F
   
    disease drug#disease         1126.1     8    140.7625       1.27     0.2801
                Residual     5080.8167    46  110.45254

With the test example above, we found that a one-way model fits these data well. We could use either Stata's anova or Stata’s oneway to fit a one-way model.

. oneway systolic drug, bonferroni



                        Analysis of Variance
    Source              SS         df      MS            F     Prob > F

Between groups      3133.23851      3   1044.41284      9.09     0.0001
 Within groups      6206.91667     54   114.942901

    Total           9340.15517     57   163.862371




Bartlett's test for equal variances:  chi2(3) =   1.0063  Prob>chi2 = 0.800

            Comparison of Increment in Systolic B.P. by Drug Used
                                (Bonferroni)


Row Mean-  
Col Mean             1          2          3
   
       2      -.533333
                1.000
          
       3      -17.3167   -16.7833
                0.001      0.001
          
       4      -12.5667   -12.0333       4.75
                0.012      0.017      1.000

Table 7.7 of Winer, Brown, and Michels (1991) provides a repeated-measures ANOVA example involving both nested and crossed terms. There are four dial shapes and two methods for calibrating dials. Subjects are nested within the calibration method, and an accuracy score is obtained.

Here is Stata's anova for this problem.

. webuse t77
(T7.7 -- Winer, Brown, Michels)

. anova score calib / subject|calib shape calib#shape , repeated(shape)

                           	 Number of obs =      24     R-squared     =  0.8925
                           	 Root MSE      = 1.11181     Adj R-squared =  0.7939


                    Source     Partial SS    df       MS           F     Prob > F
               
                     Model        123.125    11  11.1931818       9.06     0.0003
                        
                     calib     51.0416667     1  51.0416667      11.89     0.0261
             subject|calib     17.1666667     4  4.29166667
               
                     shape     47.4583333     3  15.8194444      12.80     0.0005
               calib#shape     7.45833333     3  2.48611111       2.01     0.1662
                       
                  Residual     14.8333333    12  1.23611111
               
                     Total     137.958333    23  5.99818841



Between-subjects error term:  subject|calib
                     Levels:  6         (4 df)
     Lowest b.s.e. variable:  subject
     Covariance pooled over:  calib     (for repeated variable)

Repeated variable: shape
                                          Huynh-Feldt epsilon        =  0.8483
                                          Greenhouse-Geisser epsilon =  0.4751
                                          Box's conservative epsilon =  0.3333



                                           ------------   Prob > F  ------------

                  Source        df      F    Regular    H-F      G-G      Box
                  
                   shape         3    12.80   0.0005   0.0011   0.0099   0.0232
             calib#shape         3     2.01   0.1662   0.1791   0.2152   0.2291
                Residual        12

References

Afifi, A. A., and S. P. Azen. 1979. Statistical Analysis: A computer-oriented approach. 2nd ed. New York: Academic Press.

Winer, B. J., R. Brown, and K. M. Michels. 1991. Statistical Principles in Experimental Design. 3rd ed. New York: McGraw–Hill.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Source		Partial SS df MS F Prob > F

	Model	4259.3385 11 387.21259 3.51 0.0013

	drug	2997.4719 3 999.15729 9.05 0.0001
	disease	415.87305 2 207.93652 1.88 0.1637
	drug#disease	707.26626 6 117.87771 1.07 0.3958

	Residual	5080.8167 46 110.45254

	Total	9340.1552 57 163.86237


		Patient's Disease
		1 2 3 Total

Drug used
1		6 4 5 15
2		5 4 6 15
3		3 5 4 12
4		5 6 5 16
Total		19 19 20 58


		df F P>F

drug@disease
(4 vs 3) 1		1 0.13 0.7234
(4 vs 3) 2		1 1.76 0.1917
(4 vs 3) 3		1 0.65 0.4230
Joint		3 0.85 0.4761

Denominator		46


		Contrast Std. Err. [95% Conf. Interval]

drug@disease
(4 vs 3) 1		-2.733333 7.675156 -18.18262 12.71595
(4 vs 3) 2		8.433333 6.363903 -4.376539 21.24321
(4 vs 3) 3		5.7 7.050081 -8.491077 19.89108

Source	SS df MS	Number of obs = 58
		F( 11, 46) = 3.51
Model	4259.33851 11 387.212591	Prob > F = 0.0013
Residual	5080.81667 46 110.452536	R-squared = 0.4560
		Adj R-squared = 0.3259
Total	9340.15517 57 163.862371	Root MSE = 10.51


systolic		Coef. Std. Err. t P>t\| [95% Conf. Interval]

drug
2		-1.333333 6.363903 -0.21 0.835 -14.14321 11.47654
3		-13 7.431438 -1.75 0.087 -27.95871 1.958708
4		-15.73333 6.363903 -2.47 0.017 -28.54321 -2.923461

disease
2		-1.083333 6.783944 -0.16 0.874 -14.7387 12.57204
3		-8.933333 6.363903 -1.40 0.167 -21.74321 3.876539

drug#disease
2 2		6.583333 9.783943 0.67 0.504 -13.11072 26.27739
2 3		-.9 8.999918 -0.10 0.921 -19.0159 17.2159
3 2		-10.85 10.24353 -1.06 0.295 -31.46916 9.769157
3 3		1.1 10.24353 0.11 0.915 -19.51916 21.71916
4 2		.3166667 9.301675 0.03 0.973 -18.40663 19.03997
4 3		9.533333 9.202189 1.04 0.306 -8.989712 28.05638

_cons		29.33333 4.290543 6.84 0.000 20.69692 37.96975

Source		Partial SS df MS F Prob > F

disease drug#disease		1126.1 8 140.7625 1.27 0.2801
Residual		5080.8167 46 110.45254

Analysis of Variance
Source SS df MS F Prob > F

Between groups 3133.23851 3 1044.41284 9.09 0.0001
Within groups 6206.91667 54 114.942901

Total 9340.15517 57 163.862371

Row Mean-
Col Mean		1 2 3

2		-.533333
		1.000

3		-17.3167 -16.7833
		0.001 0.001

4		-12.5667 -12.0333 4.75
		0.012 0.017 1.000

Source		df F Regular H-F G-G Box

shape		3 12.80 0.0005 0.0011 0.0099 0.0232
calib#shape		3 2.01 0.1662 0.1791 0.2152 0.2291
Residual		12