Stata
Products Purchase Support Company
Search
   >> Home >> Resources & support >> FAQs >> The anova command and collinearity

How does the anova command handle collinearity?

Title   The anova command and collinearity
Author William Sribney, StataCorp
Date March 1997; minor revisions July 2007

Here’s an example that illustrates what happens.

. input woman twin  
    
         woman       twin  
  1.         1          1  
  2.         2          1  
  3.         3          2  
  4.         4          2  
  5.         5          3  
  6.         6          3  
  7. end

. tab woman, gen(w)

      woman |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          1       16.67       16.67
          2 |          1       16.67       33.33
          3 |          1       16.67       50.00
          4 |          1       16.67       66.67
          5 |          1       16.67       83.33
          6 |          1       16.67      100.00
------------+-----------------------------------
      Total |          6      100.00

. tab twin, gen(t)

       twin |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2       33.33       33.33
          2 |          2       33.33       66.67
          3 |          2       33.33      100.00
------------+-----------------------------------
      Total |          6      100.00

. gen t1w1 = t1*w1 - t1*w2

. gen t2w3 = t2*w3 - t2*w4

. gen t3w5 = t3*w5 - t3*w6

. list w* t*, nodisplay

     +--------------------------------------------------------------------------------+
     | woman   w1   w2   w3   w4   w5   w6   twin   t1   t2   t3   t1w1   t2w3   t3w5 |
     |--------------------------------------------------------------------------------|
  1. |     1    1    0    0    0    0    0      1    1    0    0      1      0      0 |
  2. |     2    0    1    0    0    0    0      1    1    0    0     -1      0      0 |
  3. |     3    0    0    1    0    0    0      2    0    1    0      0      1      0 |
  4. |     4    0    0    0    1    0    0      2    0    1    0      0     -1      0 |
  5. |     5    0    0    0    0    1    0      3    0    0    1      0      0      1 |
     |--------------------------------------------------------------------------------|
  6. |     6    0    0    0    0    0    1      3    0    0    1      0      0     -1 |
     +--------------------------------------------------------------------------------+

. gen x = 12 - int(2*uniform())

. expand x
(62 observations created)
. set seed 123
. gen y = uniform()

. anova y woman twin

                        Number of obs =      70     R-squared     =  0.0801
                        Root MSE      = .288572     Adj R-squared =  0.0082

              Source |  Partial SS    df       MS           F     Prob > F
          -----------+----------------------------------------------------
               Model |  .463881941     5  .092776388       1.11     0.3618
                     |
               woman |  .463881941     5  .092776388       1.11     0.3618
                twin |           0     0
                     |
            Residual |  5.32951143    64  .083273616   
          -----------+----------------------------------------------------
               Total |  5.79339337    69  .083962223   


. regress y w1-w5 t1-t3

  Source |       SS       df       MS              Number of obs =      70
---------+------------------------------           F(  5,    64) =    1.11
   Model |  .463881941     5  .092776388           Prob > F      =  0.3618
Residual |  5.32951143    64  .083273616           R-squared     =  0.0801
---------+------------------------------           Adj R-squared =  0.0082
   Total |  5.79339337    69  .083962223           Root MSE      =  .28857

--------------------------------------------------------------------------
       y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------+----------------------------------------------------------------
      w1 |  (dropped)
      w2 |  -.0548665   .1204566    -0.46   0.650    -.2955063    .1857732
      w3 |  -.0594359   .1178089    -0.50   0.616    -.2947862    .1759144
      w4 |  (dropped)
      w5 |    .209396   .1204566     1.74   0.087    -.0312437    .4500358
      t1 |  (dropped)
      t2 |  -.1140098   .1178089    -0.97   0.337    -.3493601    .1213405
      t3 |  -.1589105   .1178089    -1.35   0.182    -.3942608    .0764398
   _cons |   .5604848   .0833035     6.73   0.000      .394067    .7269026
--------------------------------------------------------------------------    

The regress model is obviously collinear, but so was the anova model. The anova command keeps terms from left to right. Hence, it “dropped” the twin effect (i.e., all the twin dummies).

. anova y twin woman

                        Number of obs =      70     R-squared     =  0.0801
                        Root MSE      = .288572     Adj R-squared =  0.0082

              Source |  Partial SS    df       MS           F     Prob > F
          -----------+----------------------------------------------------
               Model |  .463881941     5  .092776388       1.11     0.3618
                     |
                twin |  .062313132     2  .031156566       0.37     0.6894
               woman |  .290114367     3  .096704789       1.16     0.3315
                     |
            Residual |  5.32951143    64  .083273616   
          -----------+----------------------------------------------------
               Total |  5.79339337    69  .083962223   

Again, anova keeps terms from left to right; here it kept only three out of the six women dummies.

. anova y twin twin*woman

                       Number of obs =      70     R-squared     =  0.0801
                       Root MSE      = .288572     Adj R-squared =  0.0082

             Source |  Partial SS    df       MS           F     Prob > F
         -----------+----------------------------------------------------
              Model |  .463881941     5  .092776388       1.11     0.3618
                    |
               twin |  .175133264     2  .087566632       1.05     0.3554
         twin*woman |  .290114367     3  .096704789       1.16     0.3315
                    |
           Residual |  5.32951143    64  .083273616   
         -----------+----------------------------------------------------
              Total |  5.79339337    69  .083962223   

Below, we do the equivalent regression.

. regress y t1 t2 t1w1 t2w3 t3w5

  Source |       SS       df       MS              Number of obs =      70
---------+------------------------------           F(  5,    64) =    1.11
   Model |  .463881941     5  .092776388           Prob > F      =  0.3618
Residual |  5.32951143    64  .083273616           R-squared     =  0.0801
---------+------------------------------           Adj R-squared =  0.0082
   Total |  5.79339337    69  .083962223           Root MSE      =  .28857

--------------------------------------------------------------------------
       y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------+----------------------------------------------------------------
      t1 |   .0267792   .0851757     0.31   0.754    -.1433788    .1969372
      t2 |  -.0895153   .0842448    -1.06   0.292    -.2578136     .078783
    t1w1 |   .0274333   .0602283     0.46   0.650    -.0928866    .1477531
    t2w3 |   -.029718   .0589044    -0.50   0.616    -.1473931    .0879572
    t3w5 |    .104698   .0602283     1.74   0.087    -.0156219    .2250179
   _cons |   .5062724   .0602283     8.41   0.000     .3859525    .6265922
--------------------------------------------------------------------------


. test t1 t2

 ( 1)  t1 = 0.0
 ( 2)  t2 = 0.0

       F(  2,    62) =    1.05
            Prob > F =    0.3554


. test t1w1 t2w3 t3w5

I made the interactions orthogonal, which is essentially what anova does.

( 1)  t1w1 = 0.0
( 2)  t2w3 = 0.0
( 3)  t3w5 = 0.0

      F(  3,    62) =    1.16
           Prob > F =    0.3315

You understand the above Wald tests. The anova partial SS and their tests are equivalent. I call them “added-last” tests for obvious reason.

The test of t1 = t2 = 0 is a test of

y = t1w1 t2w3 t3w5 t1 t2
vs.
y = t1w1 t2w3 t3w5

(Comment: It’s kind of a stupid test in this case. Obviously, partial SS and their tests make more sense for different covariates rather than interactions and main effects.)

The following explains sequential SS:

. anova y twin twin*woman, seq

                       Number of obs =      70     R-squared     =  0.0801
                       Root MSE      = .288572     Adj R-squared =  0.0082

              Source |    Seq. SS     df       MS           F     Prob > F
          -----------+----------------------------------------------------
               Model |  .463881941     5  .092776388       1.11     0.3618
                     |
                twin |  .173767574     2  .086883787       1.04     0.3582
          twin*woman |  .290114367     3  .096704789       1.16     0.3315
                     |
            Residual |  5.32951143    64  .083273616   
          -----------+----------------------------------------------------
               Total |  5.79339337    69  .083962223   

. anova y twin

                       Number of obs =      70     R-squared     =  0.0300
                       Root MSE      = .289612     Adj R-squared =  0.0010

              Source |  Partial SS    df       MS           F     Prob > F
          -----------+----------------------------------------------------
               Model |  .173767574     2  .086883787       1.04     0.3605
                     |
                twin |  .173767574     2  .086883787       1.04     0.3605
                     |
            Residual |   5.6196258    67  .083875012   
          -----------+----------------------------------------------------
               Total |  5.79339337    69  .083962223   

The twin SS are the same in the two preceding anovas. The difference in the tests is in the denominator of the F. The residuals are obviously different. I (and my profs) prefer the second for testing “main effects”.

Clearly, I take a model-building approach to anova and think in terms of the equivalent regression.

You can type regress after running anova to view an equivalent regression.

When using interactions in anova, it always includes main effects for interactions, even if you don’t explicitly do so.

    . anova y twin*woman

                            Number of obs =      70     R-squared     =  0.0801
                            Root MSE      = .288572     Adj R-squared =  0.0082

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  .463881941     5  .092776388       1.11     0.3618
                         |
              twin*woman |  .463881941     5  .092776388       1.11     0.3618
                         |
                Residual |  5.32951143    64  .083273616   
              -----------+----------------------------------------------------
                   Total |  5.79339337    69  .083962223   

sequential does the same.

. anova y twin*woman twin, seq

                        Number of obs =      70     R-squared     =  0.0801
                        Root MSE      = .288572     Adj R-squared =  0.0082

              Source |    Seq. SS     df       MS           F     Prob > F
          -----------+----------------------------------------------------
               Model |  .463881941     5  .092776388       1.11     0.3618
                     |
          twin*woman |  .463881941     5  .092776388       1.11     0.3618
                twin |           0     0
                     |
            Residual |  5.32951143    64  .083273616   
          -----------+----------------------------------------------------
               Total |  5.79339337    69  .083962223 
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Macintosh
Technical support
Resources & support
FAQs
Technical support
NetCourses
Short courses
Users Group meetings
Statalist
Links
Software updates
Software archives
Customer service
Manuals & supplements
Stata Journal
STB
Stata News
Stata Automation
Plugins

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index