FAQ: Single degree-of-freedom tests after ANOVA using nonresidual error

Home / Resources & support / FAQs / Single degree-of-freedom tests after ANOVA using nonresidual error

How can you specify a term other than residual error as the denominator in a single degree-of-freedom F test after ANOVA?

Title		Single degree-of-freedom tests after ANOVA using nonresidual error
Author		Kenneth Higbee, StataCorp

Question:

How can I compute single degree-of-freedom tests after ANOVA and use some other term besides residual error in the denominator of the resulting F tests?

Answer:

Testing terms or combinations of terms from an ANOVA model that use something other than residual error for the denominator of the F test is easily handled directly with the test command. For instance, as shown in the section titled “Testing effects” in [R] anova postestimation, after

. anova output machine operator|machine

we can test the machine term using operator|machine as the error with

. test machine / operator|machine

We could have gotten this test directly in the ANOVA with

. anova output machine / operator|machine /

So, testing a term in the ANOVA with a nonresidual error term is easy. The question here is how to apply a nonresidual error term to a single degree-of-freedom test.

I will illustrate by showing how you obtain the pairwise comparison tests between the five different levels of the machine term in the ANOVA of the machine data used in [R] anova.

. use http://www.stata-press.com/data/r14/machine, clear
(machine data)

. anova output machine  / operator|machine /

                          Number of obs =         57    R-squared     =  0.8661
                          Root MSE      =    1.47089    Adj R-squared =  0.8077



                  Source   Partial SS         df         MS        F    Prob>F
           
                   Model    545.82229         17   32.107193     14.84  0.0000
                           
                 machine    430.98079          4    107.7452     13.82  0.0001
        operator|machine     101.3538         13   7.7964465
           
        operator|machine     101.3538         13   7.7964465      3.60  0.0009
                           
                Residual    84.376658         39   2.1635041
           
                   Total    630.19895         56   11.253553

If you have not read the section titled “Testing coefficients and contrasts of margins” in [R] anova postestimation please do so.

We are interested in the 10 pairwise comparison tests involving the five levels of machine (1 versus 2, 1 versus 3, 1 versus 4, 1 versus 5, 2 versus 3, 2 versus 4, 2 versus 5, 3 versus 4, 3 versus 5, and 4 versus 5). The appropriate error term for testing machine is operator|machine. We wish to also use this as our error term when testing the 10 single degree-of-freedom pairwise comparison tests.

The first step is to place the sums of squares (SS) and degrees of freedom (df) for the operator|machine term into some scalars that we can use later. The test command will return the values we need.

. test operator|machine


          Source   Partial SS         df         MS        F    Prob>F
   
operator|machine     101.3538         13   7.7964465      3.60  0.0009
        Residual    84.376658         39   2.1635041

The command return list shows which items have been left behind by the test command.

. return list

scalars:
                  r(F) =  3.603619995522845
               r(df_r) =  39
                r(rss) =  84.37665820758086
                 r(df) =  13
                 r(ss) =  101.3538042240784

r(ss) and r(df) are the SS and df for operator|machine (we wish these to be used in the denominator of the F tests we are about to perform). r(rss) and r(df_r) are the SS and df for residual error. We will place all four values into scalars for our later use.

. scalar den_ss = r(ss)

. scalar den_df = r(df)

. scalar res_ss = r(rss)

. scalar res_df = r(df_r)

I prefer forming single degrees-of-freedom tests after a complicated ANOVA by using the equivalent cell means ANOVA model. We first generate one variable corresponding to the cells of our more complicated ANOVA model.

. egen z = group(machine operator)

. table (machine) (operator), statistic(mean z) nototals



                             operator nested in machine   
                               1        2        3       4

five brands of machine   
  1                            1        2        3       4
  2                            5        6        7       8
  3                            9       10       11        
  4                           12       13       14      15
  5                           16       17       18

Here, there are 18 cells (of a possible 20 for this design). Machine brands 3 and 5 each have a missing cell. This table indicates, for example, level 1 of machine corresponds to the first four cells (a value of z of 1, 2, 3, or 4).

We use the noconstant option of anova and our cell indicator variable z to obtain the cell means ANOVA.

. anova output bn.z, noconstant

                          Number of obs =         57    R-squared     =  0.9912
                          Root MSE      =    1.47089    Adj R-squared =  0.9871



                  Source   Partial SS         df         MS        F    Prob>F
                  

                   Model    9512.1735         18   528.45408    244.26  0.0000
                           
                       z    9512.1735         18   528.45408    244.26  0.0000
                           
                Residual    84.376658         39   2.1635041
                 
                   Total    9596.5501         57   168.36053

This ANOVA table is not really of interest to us. However, the single degrees-of-freedom tests that we wish to perform are, in my opinion, easier to form from this ANOVA.

We first perform the test letting residual error be the error term for the test. Here is the test comparing machine 1 with 2:

. test (i1.z+i2.z+i3.z+i4.z)/4 == (i5.z + i6.z + i7.z + i8.z)/4

 ( 1)  .25*1bn.z + .25*2.z + .25*3.z + .25*4.z - .25*5.z - .25*6.z - .25*7.z -
       .25*8.z = 0

       F(  1,    39) =   33.17
            Prob > F =    0.0000

We could have obtained the same result with

. mat c12 = (1,1,1,1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0)

. test , test(c12)

This is not the F test or p-value that we really want. However, we can grab some values that are returned after this test command and use them with the den_ss, den_df, res_ss, and res_df scalars that we saved earlier to obtain the desired F test and p-value.

. return list

scalars:
                  r(p) =  1.11948557182e-06
                 r(df) =  1
                  r(F) =  33.16595383079808
               r(df_r) =  39
               r(drop) =  0

. scalar num_df = r(df)

. scalar num_ss = r(F)*num_df*res_ss/res_df

. scalar newF = (num_ss/num_df)/(den_ss/den_df)

. scalar newp = Ftail(num_df,den_df,newF)

The numerator degrees-of-freedom is simply 1 here, but I place it in a scalar (num_df) so that if you apply these same steps to some other situation where you are combining several tests (using the accum option of test, for instance), the following steps will continue to work. The numerator sums-of-squares are obtained using some algebra on the formula for an F test. An F test is (numerator_SS/numerator_df)/(denominator_SS/denominator_df). The F test just produced has residual error in the denominator. We solve for the numerator sums of squares and place that result in the scalar num_ss. The new F test value and p-value are then computed.

Many people think it is important to also perform a correction on the p-values when performing multiple comparisons. Here, we are interested in 10 single degree-of-freedom tests. We can apply the Bonferroni correction by simply multiplying the p-value by 10. We show the F test, p-value, and Bonferroni-corrected p-value next.

. disp "F(" num_df "," den_df ") = " newF ", p-value = " newp ", Bonferroni p-value = " min(1,newp*10)
F(1,13) = 9.2035103, p-value = .00959573, Bonferroni p-value = .09595729

We could gather the unadjusted p-values for all 10 tests into a matrix and then use the _mtest command to compute the corrected p-values. We will show this later.

The test comparing machine 1 with machine 3 is obtained similarly. Refer to the table showing how the z variable corresponds to our original ANOVA design to see how we constructed this test.

. test (i1.z+i2.z+i3.z+i4.z)/4 =  (i9.z+i10.z+i11.z)/3

 ( 1)  .25*1bn.z + .25*2.z + .25*3.z + .25*4.z - .3333333*9.z - .3333333*10.z -
       .3333333*11.z = 0

       F(  1,    39) =   10.20
            Prob > F =    0.0028

Again we ignore this output and compute the appropriate F test and p-value(s).

. scalar num_df = r(df)

. scalar num_ss = r(F)*num_df*res_ss/res_df

. scalar newF = (num_ss/num_df)/(den_ss/den_df)

. scalar newp = Ftail(num_df,den_df,newF)

. disp "F(" num_df "," den_df ") = " newF ", p-value = " newp ", Bonferroni p-value = " min(1,newp*10)
F(1,13) = 2.8306705, p-value = .11632831, Bonferroni p-value = 1

There are eight more pairwise tests of interest. They can be computed in the same way. I omit them for brevity. These steps could be done interactively or by writing a do-file or ado-file.

The test() and mtest() options of test after anova provide a shortcut. We create a matrix with 10 rows corresponding to the tests of interest. There are many ways to form such a matrix. Here is one example.

. mat m4 = J(1,4,1/4)

. mat m3 = J(1,3,1/3)

. mat z4 = J(1,4,0)

. mat z3 = J(1,3,0)

. // mach = 1   2   3   4   5
. mat c = (m4,-m4, z3, z4, z3 \   /// 1 vs 2
              m4, z4,-m3, z4, z3 \   /// 1 vs 3
              m4, z4, z3,-m4, z3 \   /// 1 vs 4
              m4, z4, z3, z4,-m3 \   /// 1 vs 5
              z4, m4,-m3, z4, z3 \   /// 2 vs 3
              z4, m4, z3,-m4, z3 \   /// 2 vs 4
              z4, m4, z3, z4,-m3 \   /// 2 vs 5
              z4, z4, m3,-m4, z3 \   /// 3 vs 4
              z4, z4, m3, z4,-m3 \   /// 3 vs 5
              z4, z4, z3, m4,-m3)    //  4 vs 5

c is a 10 × 18 matrix with each of the rows providing one of the 10 comparisons of interest.

. test , test(c) mtest(noadjust)

 ( 1)  .25*1bn.z + .25*2.z + .25*3.z + .25*4.z - .25*5.z - .25*6.z - .25*7.z -
       .25*8.z = 0
 ( 2)  .25*1bn.z + .25*2.z + .25*3.z + .25*4.z - .3333333*9.z - .3333333*10.z -
       .3333333*11.z = 0
 ( 3)  .25*1bn.z + .25*2.z + .25*3.z + .25*4.z - .25*12.z - .25*13.z - .25*14.z
       - .25*15.z = 0
 ( 4)  .25*1bn.z + .25*2.z + .25*3.z + .25*4.z - .3333333*16.z - .3333333*17.z -
       .3333333*18.z = 0
 ( 5)  .25*5.z + .25*6.z + .25*7.z + .25*8.z - .3333333*9.z - .3333333*10.z -
       .3333333*11.z = 0
 ( 6)  .25*5.z + .25*6.z + .25*7.z + .25*8.z - .25*12.z - .25*13.z - .25*14.z -
       .25*15.z = 0
 ( 7)  .25*5.z + .25*6.z + .25*7.z + .25*8.z - .3333333*16.z - .3333333*17.z -
       .3333333*18.z = 0
 ( 8)  .3333333*9.z + .3333333*10.z + .3333333*11.z - .25*12.z - .25*13.z -
       .25*14.z - .25*15.z = 0
 ( 9)  .3333333*9.z + .3333333*10.z + .3333333*11.z - .3333333*16.z -
       .3333333*17.z - .3333333*18.z = 0
 (10)  .25*12.z + .25*13.z + .25*14.z + .25*15.z - .3333333*16.z - .3333333*17.z
       - .3333333*18.z = 0
       Constraint 1 dropped
       Constraint 2 dropped
       Constraint 3 dropped
       Constraint 6 dropped
       Constraint 7 dropped
       Constraint 10 dropped



            F(df,39)     df      p > F
   
  (1)          33.17      1     0.0000*
  (2)          10.20      1     0.0028*
  (3)         182.36      1     0.0000*
  (4)          55.31      1     0.0000*
  (5)           5.25      1     0.0274*
  (6)          49.72      1     0.0000*
  (7)           2.28      1     0.1393*
  (8)          85.30      1     0.0000*
  (9)          14.37      1     0.0005*
  (10)         31.17      1     0.0000*
   
  All          49.80      4     0.0000

* Unadjusted p-values

. mat noadj = r(mtest)

. mat list noadj

noadj[10,3]
             F         df          p
 r1  33.165954          1  1.119e-06
 r2  10.200661          1  .00277728
 r3  182.35752          1  2.754e-16
 r4  55.309036          1  5.424e-09
 r5  5.2530861          1  .02738983
 r6  49.722181          1  1.822e-08
 r7  2.2775523          1  .13931855
 r8  85.304594          1  2.312e-11
 r9  14.371257          1  .00050884
r10  31.170658          1  1.964e-06

We send test our matrix using the test() option and use mtest(noadjust) to obtain each of the unadjusted single degrees-of-freedom tests. These tests used residual error.

We can follow similar steps as before starting with this result to obtain the single degrees-of-freedom tests with the operator|machine term as the error term.

Remember, we previously ran:

. test operator|machine
    
. scalar den_ss = r(ss)
 
. scalar den_df = r(df)
 
. scalar res_ss = r(rss)
 
. scalar res_df = r(df_r)

after running the standard overparameterized ANOVA. We now execute

. local rows = rowsof(noadj)

. mat num_ss = J(`rows',1,.)

. mat new = J(`rows',3,.)

. mat colnames new = F df p

. mat new[1,2] = noadj[1...,2]

. forvalues i = 1/`rows' {
  2.    mat num_ss[`i',1] = noadj[`i',1]*noadj[`i',2]*res_ss/res_df
  3.    mat new[`i',1] = (num_ss[`i',1]/noadj[`i',2])/(den_ss/den_df)
  4.    mat new[`i',3] = Ftail(new[`i',2],den_df,new[`i',1])
  5. }

. mat list new

new[10,3]
             F         df          p
 r1  9.2035103          1  .00959573
 r2  2.8306705          1  .11632831
 r3  50.603981          1  7.890e-06
 r4  15.348188          1  .00176614
 r5  1.4577248          1  .24880133
 r6  13.797842          1  .00259704
 r7   .6320179          1  .44088968
 r8  23.671917          1  .00030856
 r9  3.9880057          1  .06720042
r10  8.6498183          1   .0114672

Matrix new contains the tests using operator|machine as the error term.

We can obtain multiple comparison testing corrections by calling _mtest.

. _mtest adjust new , mtest(bonferroni) pindex(3) append

. mat new_adj = r(result)

. mat list new_adj

new_adj[10,4]
             F         df          p    adjustp
 r1  9.2035103          1  .00959573  .09595729
 r2  2.8306705          1  .11632831          1
 r3  50.603981          1  7.890e-06   .0000789
 r4  15.348188          1  .00176614  .01766142
 r5  1.4577248          1  .24880133          1
 r6  13.797842          1  .00259704   .0259704
 r7   .6320179          1  .44088968          1
 r8  23.671917          1  .00030856  .00308561
 r9  3.9880057          1  .06720042  .67200415
r10  8.6498183          1   .0114672  .11467197

Here we asked for Bonferroni-adjusted p-values.

How can you specify a term other than residual error as the denominator in a single degree-of-freedom F test after ANOVA?

Question:

Answer:

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

	Source		Partial SS df MS F Prob>F

	Model		545.82229 17 32.107193 14.84 0.0000

	machine		430.98079 4 107.7452 13.82 0.0001
	operator\|machine		101.3538 13 7.7964465

	operator\|machine		101.3538 13 7.7964465 3.60 0.0009

	Residual		84.376658 39 2.1635041

	Total		630.19895 56 11.253553


		operator nested in machine
		1 2 3 4

five brands of machine
1		1 2 3 4
2		5 6 7 8
3		9 10 11
4		12 13 14 15
5		16 17 18


		F(df,39) df p > F

(1)		33.17 1 0.0000*
(2)		10.20 1 0.0028*
(3)		182.36 1 0.0000*
(4)		55.31 1 0.0000*
(5)		5.25 1 0.0274*
(6)		49.72 1 0.0000*
(7)		2.28 1 0.1393*
(8)		85.30 1 0.0000*
(9)		14.37 1 0.0005*
(10)		31.17 1 0.0000*

All		49.80 4 0.0000

Stata/MP4 Annual License (download)

How can you specify a term other than residual error as the denominator in a single degree-of-freedom F test after ANOVA?

Question:

Answer:

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies