Home  /  Resources & support  /  FAQs  /  Producing adjusted means after ANOVA
Note: This FAQ is for Stata 10 and older versions of Stata.

Stata 11 introduced the margins command, which superseded adjust.

How can I produce adjusted means after ANOVA?

Title   Producing adjusted means after ANOVA
Author Kenneth Higbee, StataCorp

Question:

Someone posed the following question:

I am running some simple ANOVAs and wanted also to produce the adjusted means. The command is a 3-way ANOVA with a single 2-way interaction. All the predictors are dichotomous (0/1) variables. There were a few problems with the output.

First, when I run

. anova opportu2 volsex frcsex volsex*frcsex q3

and then

. adjust q3 if e(sample), by(volsex frcsex) se ci

I get a table with all cells missing.

I then decided to run

. anova opportu2 volsex frcsex volsex*frcsex q3, cont(q3)
. adjust q3 if e(sample), by(volsex frcsex) se ci

This did work! What is going on?

Answer:

The result you show, comparing when q3 was used as a categorical variable and when it was specified to be a continuous variable in the ANOVA, does not surprise me. Let me explain why using the auto data.

. sysuse auto
(1978 Automobile Data)

. gen z = trunk < 14

. anova wei rep for rep*for z

                           Number of obs =      69     R-squared     =  0.6079
                           Root MSE      =  528.54     Adj R-squared =  0.5556

                  Source |  Partial SS    df       MS           F     Prob > F
           --------------+----------------------------------------------------
                   Model |  25984464.5     8  3248058.07      11.63     0.0000
                         |
                   rep78 |  1524294.09     4  381073.522       1.36     0.2571
                 foreign |  3521325.49     1  3521325.49      12.61     0.0008
           rep78*foreign |  2300624.62     2  1150312.31       4.12     0.0211
                       z |  3248513.88     1  3248513.88      11.63     0.0012
                         |
                Residual |  16761251.4    60   279354.19   
           --------------+----------------------------------------------------
                   Total |  42745715.9    68   628613.47 

I used the 0/1 variable z as a categorical variable in the anova above.

Now, just like you experienced, when I use adjust to adjust to the MEAN of z, I get nothing useful.

. adjust z if e(sample), by(for rep) se ci

----------------------------------------------------------------------------
     Dependent variable: weight     Command: anova
  Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
(8 missing values generated)
(8 missing values generated)

----------------------------------------
          |      Repair Record 1978
 Car type |    1     2     3     4     5
----------+-----------------------------
 Domestic |
          |
          |
          |
  Foreign |
          |
          |
----------------------------------------
     Key:  Linear Prediction
           (Standard Error)
           [95% Confidence Interval]

Notice the messages about missing values generated.

Think about the parameterization used by ANOVA models. Categorical variables enter the design matrix as a set of indicator (also called dummy) variables. For instance, rep78, which has five levels, becomes 5 columns in the ANOVA design matrix (one of these levels will later be dropped in the estimation process since there are only 4 degrees of freedom for the 5 levels. foreign becomes 2 columns in the design matrix, and one of them will be dropped later in the estimation process. The same is true for the z variable.

Here is a look at the underlying regression for the ANOVA above:

    . regress
    
          Source |       SS       df       MS              Number of obs =      69
    -------------+------------------------------           F(  8,    60) =   11.63
           Model |  25984464.5     8  3248058.07           Prob > F      =  0.0000
        Residual |  16761251.4    60   279354.19           R-squared     =  0.6079
    -------------+------------------------------           Adj R-squared =  0.5556
           Total |  42745715.9    68   628613.47           Root MSE      =  528.54
    
    ------------------------------------------------------------------------------
          weight        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------------------------------------------------------------
    _cons            2182.261   187.7289    11.62   0.000     1806.748    2557.775
    rep78
               1         1140   528.5397     2.16   0.035     82.76324    2197.237
               2     1020.691   431.9311     2.36   0.021     156.7003    1884.682
               3    -338.0654   352.7323    -0.96   0.342    -1043.635    367.5043
               4    -85.01959   251.2557    -0.34   0.736    -587.6057    417.5666
               5    (dropped)
    foreign
               1    -222.2614   418.2335    -0.53   0.597    -1058.853    614.3301
               2    (dropped)
    z
               1     497.4118   145.8651     3.41   0.001     205.6382    789.1855
               2    (dropped)
    rep78*foreign
            1  1    (dropped)
            2  1    (dropped)
            3  1     1470.257   536.9423     2.74   0.008     396.2125    2544.301
            3  2    (dropped)
            4  1     1270.366   504.0553     2.52   0.014     262.1051    2278.627
            4  2    (dropped)
            5  1    (dropped)
            5  2    (dropped)
    ------------------------------------------------------------------------------

predict produces missing values when asked to produce predictions for these 10 points. It does this because since z entered the ANOVA model as a categorical variable with 0 and 1 as the valid values of z, having z = .43478259 doesn’t correspond to either 0 or 1.

predict would, for instance, also produce a missing value if you asked for a prediction when rep78 = 3.257, rep78 = 12, etc. After anova, the only valid values for categorical variables for predict are those values present in the ANOVA.

Now watch what happens when I do the following adjust:

. adjust z=0 if e(sample), by(rep for) se ci

----------------------------------------------------------------------------
     Dependent variable: weight     Command: anova
 Covariate set to value: z = 0
----------------------------------------------------------------------------

------------------------------------------------
Repair    |
Record    |               Car type
1978      |          Domestic            Foreign
----------+-------------------------------------
        1 |           3597.41
          |          (401.19)
          | [2794.91,4399.91]
          |
        2 |            3478.1
          |         (190.392)
          | [3097.26,3858.94]
          |
        3 |            3589.6            2341.61
          |         (110.519)          (320.272)
          | [3368.53,3810.67]  [1700.97,2982.25]
          |
        4 |           3642.76            2594.65
          |         (179.137)          (209.548)
          | [3284.43,4001.09]   [2175.5,3013.81]
          |
        5 |           2457.41            2679.67
          |          (401.19)          (193.923)
          | [1654.91,3259.91]  [2291.77,3067.58]
------------------------------------------------
     Key:  Linear Prediction
           (Standard Error)
           [95% Confidence Interval]

The answers I got are the adjusted predictions when z is 0. I could also get predictions when z is 1 with

. adjust z=1 if e(sample), by(rep for) se ci
    
output omitted

If I ask for predictions at any other values of z besides 0 or 1, I will get missing values from the predictions.

Now, instead of having z enter the anova model as a categorical variable, you instead send it in as a continuous variable (a covariate in an ANCOVA).

. anova wei rep for rep*for z, cont(z)

                   Number of obs =      69     R-squared     =  0.6079
                   Root MSE      =  528.54     Adj R-squared =  0.5556
    
          Source |  Partial SS    df       MS           F     Prob > F
   --------------+----------------------------------------------------
           Model |  25984464.5     8  3248058.07      11.63     0.0000
                 |
           rep78 |  1524294.09     4  381073.522       1.36     0.2571
         foreign |  3521325.49     1  3521325.49      12.61     0.0008
   rep78*foreign |  2300624.62     2  1150312.31       4.12     0.0211
               z |  3248513.88     1  3248513.88      11.63     0.0012
                 |
        Residual |  16761251.4    60   279354.19   
   --------------+----------------------------------------------------
           Total |  42745715.9    68   628613.47   

The ANOVA table looks the same, but the underlying representation is different. There is one less column in the design matrix. z only has one column instead of two (corresponding to z=0 and z=1). Instead, since z is “continuous”, the ANOVA is happy with whatever values might happen to be in z. Since z had only two levels (0 and 1), the resulting ANOVA table is identical. This would not be true if z had 3 or more levels. Then the first anova would have had more degrees of freedom for the z, while the second anova would continue to have only 1 degree of freedom.

Here is the underlying regression for the ANOVA above:

. regress

      Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  8,    60) =   11.63
       Model |  25984464.5     8  3248058.07           Prob > F      =  0.0000
    Residual |  16761251.4    60   279354.19           R-squared     =  0.6079
-------------+------------------------------           Adj R-squared =  0.5556
       Total |  42745715.9    68   628613.47           Root MSE      =  528.54

------------------------------------------------------------------------------
      weight        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------------------------------
_cons            2679.673   193.9232    13.82   0.000     2291.769    3067.577
rep78
           1         1140   528.5397     2.16   0.035     82.76324    2197.237
           2     1020.691   431.9311     2.36   0.021     156.7003    1884.682
           3    -338.0654   352.7323    -0.96   0.342    -1043.635    367.5043
           4    -85.01959   251.2557    -0.34   0.736    -587.6057    417.5666
           5    (dropped)
foreign
           1    -222.2614   418.2335    -0.53   0.597    -1058.853    614.3301
           2    (dropped)
z               -497.4118   145.8651    -3.41   0.001    -789.1855   -205.6382
rep78*foreign
        1  1    (dropped)
        2  1    (dropped)
        3  1     1470.257   536.9423     2.74   0.008     396.2125    2544.301
        3  2    (dropped)
        4  1     1270.366   504.0553     2.52   0.014     262.1051    2278.627
        4  2    (dropped)
        5  1    (dropped)
        5  2    (dropped)
------------------------------------------------------------------------------

Unlike the first anova, the z here has only one row in the output. The underlying representation within anova is different.

Here it makes sense for predict after anova to ask for predictions when z is .43478259. As far as anova and predict are concerned, the z variable is continuous and can take on any value.

. adjust z if e(sample), by(rep for) se ci
    
----------------------------------------------------------------------------
     Dependent variable: weight     Command: anova
  Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
    
------------------------------------------------
Repair    |
Record    |               Car type              
1978      |          Domestic            Foreign
----------+-------------------------------------
        1 |           3381.15                   
          |          (382.72)                   
          |  [2615.59,4146.7]                   
          | 
        2 |           3261.84                   
          |         (188.801)                   
          | [2884.18,3639.49]                   
          | 
        3 |           3373.34            2125.34
          |         (103.704)          (307.021)
          |  [3165.9,3580.78]  [1511.21,2739.48]
          | 
        4 |           3426.49            2378.39
          |         (178.887)          (183.146)
          | [3068.66,3784.32]  [2012.04,2744.73]
          | 
        5 |           2241.15            2463.41
          |          (382.72)          (177.058)
          |  [1475.59,3006.7]  [2109.24,2817.58]
------------------------------------------------
     Key:  Linear Prediction
           (Standard Error)
           [95% Confidence Interval]