Stata: Data Analysis and Statistical Software
   >> Home >> Resources & support >> FAQs >> Producing adjusted means after ANOVA
This FAQ is for Stata 10 and older versions of Stata. Stata 11 introduced the margins command, which superseded adjust.

How can I produce adjusted means after ANOVA?

Title   Producing adjusted means after ANOVA
Author Kenneth Higbee, StataCorp
Date March 2001; updated April 2005

Question:

Someone posed the following question:

I am running some simple ANOVAs and wanted also to produce the adjusted means. The command is a 3-way ANOVA with a single 2-way interaction. All the predictors are dichotomous (0/1) variables. There were a few problems with the output.

First, when I run

. anova opportu2 volsex frcsex volsex*frcsex q3

and then

. adjust q3 if e(sample), by(volsex frcsex) se ci

I get a table with all cells missing.

I then decided to run

. anova opportu2 volsex frcsex volsex*frcsex q3, cont(q3)
. adjust q3 if e(sample), by(volsex frcsex) se ci

This did work! What is going on?

Answer:

The result you show, comparing when q3 was used as a categorical variable and when it was specified to be a continuous variable in the ANOVA, does not surprise me. Let me explain why using the auto data.

. sysuse auto
(1978 Automobile Data)

. gen z = trunk < 14

. anova wei rep for rep*for z

                           Number of obs =      69     R-squared     =  0.6079
                           Root MSE      =  528.54     Adj R-squared =  0.5556

                  Source |  Partial SS    df       MS           F     Prob > F
           --------------+----------------------------------------------------
                   Model |  25984464.5     8  3248058.07      11.63     0.0000
                         |
                   rep78 |  1524294.09     4  381073.522       1.36     0.2571
                 foreign |  3521325.49     1  3521325.49      12.61     0.0008
           rep78*foreign |  2300624.62     2  1150312.31       4.12     0.0211
                       z |  3248513.88     1  3248513.88      11.63     0.0012
                         |
                Residual |  16761251.4    60   279354.19   
           --------------+----------------------------------------------------
                   Total |  42745715.9    68   628613.47 

I used the 0/1 variable z as a categorical variable in the anova above.

Now, just like you experienced, when I use adjust to adjust to the MEAN of z, I get nothing useful.

. adjust z if e(sample), by(for rep) se ci

----------------------------------------------------------------------------
     Dependent variable: weight     Command: anova
  Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
(8 missing values generated)
(8 missing values generated)

----------------------------------------
          |      Repair Record 1978
 Car type |    1     2     3     4     5
----------+-----------------------------
 Domestic |
          |
          |
          |
  Foreign |
          |
          |
----------------------------------------
     Key:  Linear Prediction
           (Standard Error)
           [95% Confidence Interval]

Notice the messages about missing values generated.

Think about the parameterization used by ANOVA models. Categorical variables enter the design matrix as a set of indicator (also called dummy) variables. For instance, rep78, which has five levels, becomes 5 columns in the ANOVA design matrix (one of these levels will later be dropped in the estimation process since there are only 4 degrees of freedom for the 5 levels. foreign becomes 2 columns in the design matrix, and one of them will be dropped later in the estimation process. The same is true for the z variable.

Here is a look at the underlying regression for the ANOVA above:

    . regress
    
          Source |       SS       df       MS              Number of obs =      69
    -------------+------------------------------           F(  8,    60) =   11.63
           Model |  25984464.5     8  3248058.07           Prob > F      =  0.0000
        Residual |  16761251.4    60   279354.19           R-squared     =  0.6079
    -------------+------------------------------           Adj R-squared =  0.5556
           Total |  42745715.9    68   628613.47           Root MSE      =  528.54
    
    ------------------------------------------------------------------------------
          weight        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------------------------------------------------------------
    _cons            2182.261   187.7289    11.62   0.000     1806.748    2557.775
    rep78
               1         1140   528.5397     2.16   0.035     82.76324    2197.237
               2     1020.691   431.9311     2.36   0.021     156.7003    1884.682
               3    -338.0654   352.7323    -0.96   0.342    -1043.635    367.5043
               4    -85.01959   251.2557    -0.34   0.736    -587.6057    417.5666
               5    (dropped)
    foreign
               1    -222.2614   418.2335    -0.53   0.597    -1058.853    614.3301
               2    (dropped)
    z
               1     497.4118   145.8651     3.41   0.001     205.6382    789.1855
               2    (dropped)
    rep78*foreign
            1  1    (dropped)
            2  1    (dropped)
            3  1     1470.257   536.9423     2.74   0.008     396.2125    2544.301
            3  2    (dropped)
            4  1     1270.366   504.0553     2.52   0.014     262.1051    2278.627
            4  2    (dropped)
            5  1    (dropped)
            5  2    (dropped)
    ------------------------------------------------------------------------------

predict produces missing values when asked to produce predictions for these 10 points. It does this because since z entered the ANOVA model as a categorical variable with 0 and 1 as the valid values of z, having z = .43478259 doesn’t correspond to either 0 or 1.

predict would, for instance, also produce a missing value if you asked for a prediction when rep78 = 3.257, rep78 = 12, etc. After anova, the only valid values for categorical variables for predict are those values present in the ANOVA.

Now watch what happens when I do the following adjust:

. adjust z=0 if e(sample), by(rep for) se ci

----------------------------------------------------------------------------
     Dependent variable: weight     Command: anova
 Covariate set to value: z = 0
----------------------------------------------------------------------------

------------------------------------------------
Repair    |
Record    |               Car type
1978      |          Domestic            Foreign
----------+-------------------------------------
        1 |           3597.41
          |          (401.19)
          | [2794.91,4399.91]
          |
        2 |            3478.1
          |         (190.392)
          | [3097.26,3858.94]
          |
        3 |            3589.6            2341.61
          |         (110.519)          (320.272)
          | [3368.53,3810.67]  [1700.97,2982.25]
          |
        4 |           3642.76            2594.65
          |         (179.137)          (209.548)
          | [3284.43,4001.09]   [2175.5,3013.81]
          |
        5 |           2457.41            2679.67
          |          (401.19)          (193.923)
          | [1654.91,3259.91]  [2291.77,3067.58]
------------------------------------------------
     Key:  Linear Prediction
           (Standard Error)
           [95% Confidence Interval]

The answers I got are the adjusted predictions when z is 0. I could also get predictions when z is 1 with

. adjust z=1 if e(sample), by(rep for) se ci
    
output omitted

If I ask for predictions at any other values of z besides 0 or 1, I will get missing values from the predictions.

Now, instead of having z enter the anova model as a categorical variable, you instead send it in as a continuous variable (a covariate in an ANCOVA).

. anova wei rep for rep*for z, cont(z)

                   Number of obs =      69     R-squared     =  0.6079
                   Root MSE      =  528.54     Adj R-squared =  0.5556
    
          Source |  Partial SS    df       MS           F     Prob > F
   --------------+----------------------------------------------------
           Model |  25984464.5     8  3248058.07      11.63     0.0000
                 |
           rep78 |  1524294.09     4  381073.522       1.36     0.2571
         foreign |  3521325.49     1  3521325.49      12.61     0.0008
   rep78*foreign |  2300624.62     2  1150312.31       4.12     0.0211
               z |  3248513.88     1  3248513.88      11.63     0.0012
                 |
        Residual |  16761251.4    60   279354.19   
   --------------+----------------------------------------------------
           Total |  42745715.9    68   628613.47   

The ANOVA table looks the same, but the underlying representation is different. There is one less column in the design matrix. z only has one column instead of two (corresponding to z=0 and z=1). Instead, since z is “continuous”, the ANOVA is happy with whatever values might happen to be in z. Since z had only two levels (0 and 1), the resulting ANOVA table is identical. This would not be true if z had 3 or more levels. Then the first anova would have had more degrees of freedom for the z, while the second anova would continue to have only 1 degree of freedom.

Here is the underlying regression for the ANOVA above:

. regress

      Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  8,    60) =   11.63
       Model |  25984464.5     8  3248058.07           Prob > F      =  0.0000
    Residual |  16761251.4    60   279354.19           R-squared     =  0.6079
-------------+------------------------------           Adj R-squared =  0.5556
       Total |  42745715.9    68   628613.47           Root MSE      =  528.54

------------------------------------------------------------------------------
      weight        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------------------------------
_cons            2679.673   193.9232    13.82   0.000     2291.769    3067.577
rep78
           1         1140   528.5397     2.16   0.035     82.76324    2197.237
           2     1020.691   431.9311     2.36   0.021     156.7003    1884.682
           3    -338.0654   352.7323    -0.96   0.342    -1043.635    367.5043
           4    -85.01959   251.2557    -0.34   0.736    -587.6057    417.5666
           5    (dropped)
foreign
           1    -222.2614   418.2335    -0.53   0.597    -1058.853    614.3301
           2    (dropped)
z               -497.4118   145.8651    -3.41   0.001    -789.1855   -205.6382
rep78*foreign
        1  1    (dropped)
        2  1    (dropped)
        3  1     1470.257   536.9423     2.74   0.008     396.2125    2544.301
        3  2    (dropped)
        4  1     1270.366   504.0553     2.52   0.014     262.1051    2278.627
        4  2    (dropped)
        5  1    (dropped)
        5  2    (dropped)
------------------------------------------------------------------------------

Unlike the first anova, the z here has only one row in the output. The underlying representation within anova is different.

Here it makes sense for predict after anova to ask for predictions when z is .43478259. As far as anova and predict are concerned, the z variable is continuous and can take on any value.

. adjust z if e(sample), by(rep for) se ci
    
----------------------------------------------------------------------------
     Dependent variable: weight     Command: anova
  Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
    
------------------------------------------------
Repair    |
Record    |               Car type              
1978      |          Domestic            Foreign
----------+-------------------------------------
        1 |           3381.15                   
          |          (382.72)                   
          |  [2615.59,4146.7]                   
          | 
        2 |           3261.84                   
          |         (188.801)                   
          | [2884.18,3639.49]                   
          | 
        3 |           3373.34            2125.34
          |         (103.704)          (307.021)
          |  [3165.9,3580.78]  [1511.21,2739.48]
          | 
        4 |           3426.49            2378.39
          |         (178.887)          (183.146)
          | [3068.66,3784.32]  [2012.04,2744.73]
          | 
        5 |           2241.15            2463.41
          |          (382.72)          (177.058)
          |  [1475.59,3006.7]  [2109.24,2817.58]
------------------------------------------------
     Key:  Linear Prediction
           (Standard Error)
           [95% Confidence Interval]
Bookmark and Share 
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
Like us on Facebook Follow us on Twitter Follow us on LinkedIn Google+ Watch us on YouTube
Follow us
© Copyright 1996–2013 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index   |   View mobile site