>> Home >> Resources & support >> FAQs >> Producing adjusted means after ANOVA
This FAQ is for Stata 10 and older versions of Stata. Stata 11 introduced the margins command, which superseded adjust.

How can I produce adjusted means after ANOVA?

 Title Producing adjusted means after ANOVA Author Kenneth Higbee, StataCorp Date March 2001; updated April 2005

Question:

Someone posed the following question:

I am running some simple ANOVAs and wanted also to produce the adjusted means. The command is a 3-way ANOVA with a single 2-way interaction. All the predictors are dichotomous (0/1) variables. There were a few problems with the output.

First, when I run

. anova opportu2 volsex frcsex volsex*frcsex q3


and then

. adjust q3 if e(sample), by(volsex frcsex) se ci


I get a table with all cells missing.

I then decided to run

. anova opportu2 volsex frcsex volsex*frcsex q3, cont(q3)
. adjust q3 if e(sample), by(volsex frcsex) se ci


This did work! What is going on?

The result you show, comparing when q3 was used as a categorical variable and when it was specified to be a continuous variable in the ANOVA, does not surprise me. Let me explain why using the auto data.

. sysuse auto
(1978 Automobile Data)

. gen z = trunk < 14

. anova wei rep for rep*for z

Number of obs =      69     R-squared     =  0.6079
Root MSE      =  528.54     Adj R-squared =  0.5556

Source |  Partial SS    df       MS           F     Prob > F
--------------+----------------------------------------------------
Model |  25984464.5     8  3248058.07      11.63     0.0000
|
rep78 |  1524294.09     4  381073.522       1.36     0.2571
foreign |  3521325.49     1  3521325.49      12.61     0.0008
rep78*foreign |  2300624.62     2  1150312.31       4.12     0.0211
z |  3248513.88     1  3248513.88      11.63     0.0012
|
Residual |  16761251.4    60   279354.19
--------------+----------------------------------------------------
Total |  42745715.9    68   628613.47


I used the 0/1 variable z as a categorical variable in the anova above.

Now, just like you experienced, when I use adjust to adjust to the MEAN of z, I get nothing useful.

. adjust z if e(sample), by(for rep) se ci

----------------------------------------------------------------------------
Dependent variable: weight     Command: anova
Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
(8 missing values generated)
(8 missing values generated)

----------------------------------------
|      Repair Record 1978
Car type |    1     2     3     4     5
----------+-----------------------------
Domestic |
|
|
|
Foreign |
|
|
----------------------------------------
Key:  Linear Prediction
(Standard Error)
[95% Confidence Interval]


Notice the messages about missing values generated.

Think about the parameterization used by ANOVA models. Categorical variables enter the design matrix as a set of indicator (also called dummy) variables. For instance, rep78, which has five levels, becomes 5 columns in the ANOVA design matrix (one of these levels will later be dropped in the estimation process since there are only 4 degrees of freedom for the 5 levels. foreign becomes 2 columns in the design matrix, and one of them will be dropped later in the estimation process. The same is true for the z variable.

Here is a look at the underlying regression for the ANOVA above:

    . regress

Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  8,    60) =   11.63
Model |  25984464.5     8  3248058.07           Prob > F      =  0.0000
Residual |  16761251.4    60   279354.19           R-squared     =  0.6079
Total |  42745715.9    68   628613.47           Root MSE      =  528.54

------------------------------------------------------------------------------
weight        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------------------------------
_cons            2182.261   187.7289    11.62   0.000     1806.748    2557.775
rep78
1         1140   528.5397     2.16   0.035     82.76324    2197.237
2     1020.691   431.9311     2.36   0.021     156.7003    1884.682
3    -338.0654   352.7323    -0.96   0.342    -1043.635    367.5043
4    -85.01959   251.2557    -0.34   0.736    -587.6057    417.5666
5    (dropped)
foreign
1    -222.2614   418.2335    -0.53   0.597    -1058.853    614.3301
2    (dropped)
z
1     497.4118   145.8651     3.41   0.001     205.6382    789.1855
2    (dropped)
rep78*foreign
1  1    (dropped)
2  1    (dropped)
3  1     1470.257   536.9423     2.74   0.008     396.2125    2544.301
3  2    (dropped)
4  1     1270.366   504.0553     2.52   0.014     262.1051    2278.627
4  2    (dropped)
5  1    (dropped)
5  2    (dropped)
------------------------------------------------------------------------------


predict produces missing values when asked to produce predictions for these 10 points. It does this because since z entered the ANOVA model as a categorical variable with 0 and 1 as the valid values of z, having z = .43478259 doesn’t correspond to either 0 or 1.

predict would, for instance, also produce a missing value if you asked for a prediction when rep78 = 3.257, rep78 = 12, etc. After anova, the only valid values for categorical variables for predict are those values present in the ANOVA.

Now watch what happens when I do the following adjust:

. adjust z=0 if e(sample), by(rep for) se ci

----------------------------------------------------------------------------
Dependent variable: weight     Command: anova
Covariate set to value: z = 0
----------------------------------------------------------------------------

------------------------------------------------
Repair    |
Record    |               Car type
1978      |          Domestic            Foreign
----------+-------------------------------------
1 |           3597.41
|          (401.19)
| [2794.91,4399.91]
|
2 |            3478.1
|         (190.392)
| [3097.26,3858.94]
|
3 |            3589.6            2341.61
|         (110.519)          (320.272)
| [3368.53,3810.67]  [1700.97,2982.25]
|
4 |           3642.76            2594.65
|         (179.137)          (209.548)
| [3284.43,4001.09]   [2175.5,3013.81]
|
5 |           2457.41            2679.67
|          (401.19)          (193.923)
| [1654.91,3259.91]  [2291.77,3067.58]
------------------------------------------------
Key:  Linear Prediction
(Standard Error)
[95% Confidence Interval]


The answers I got are the adjusted predictions when z is 0. I could also get predictions when z is 1 with

. adjust z=1 if e(sample), by(rep for) se ci

output omitted


If I ask for predictions at any other values of z besides 0 or 1, I will get missing values from the predictions.

Now, instead of having z enter the anova model as a categorical variable, you instead send it in as a continuous variable (a covariate in an ANCOVA).

. anova wei rep for rep*for z, cont(z)

Number of obs =      69     R-squared     =  0.6079
Root MSE      =  528.54     Adj R-squared =  0.5556

Source |  Partial SS    df       MS           F     Prob > F
--------------+----------------------------------------------------
Model |  25984464.5     8  3248058.07      11.63     0.0000
|
rep78 |  1524294.09     4  381073.522       1.36     0.2571
foreign |  3521325.49     1  3521325.49      12.61     0.0008
rep78*foreign |  2300624.62     2  1150312.31       4.12     0.0211
z |  3248513.88     1  3248513.88      11.63     0.0012
|
Residual |  16761251.4    60   279354.19
--------------+----------------------------------------------------
Total |  42745715.9    68   628613.47


The ANOVA table looks the same, but the underlying representation is different. There is one less column in the design matrix. z only has one column instead of two (corresponding to z=0 and z=1). Instead, since z is “continuous”, the ANOVA is happy with whatever values might happen to be in z. Since z had only two levels (0 and 1), the resulting ANOVA table is identical. This would not be true if z had 3 or more levels. Then the first anova would have had more degrees of freedom for the z, while the second anova would continue to have only 1 degree of freedom.

Here is the underlying regression for the ANOVA above:

. regress

Source |       SS       df       MS              Number of obs =      69
-------------+------------------------------           F(  8,    60) =   11.63
Model |  25984464.5     8  3248058.07           Prob > F      =  0.0000
Residual |  16761251.4    60   279354.19           R-squared     =  0.6079
Total |  42745715.9    68   628613.47           Root MSE      =  528.54

------------------------------------------------------------------------------
weight        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------------------------------
_cons            2679.673   193.9232    13.82   0.000     2291.769    3067.577
rep78
1         1140   528.5397     2.16   0.035     82.76324    2197.237
2     1020.691   431.9311     2.36   0.021     156.7003    1884.682
3    -338.0654   352.7323    -0.96   0.342    -1043.635    367.5043
4    -85.01959   251.2557    -0.34   0.736    -587.6057    417.5666
5    (dropped)
foreign
1    -222.2614   418.2335    -0.53   0.597    -1058.853    614.3301
2    (dropped)
z               -497.4118   145.8651    -3.41   0.001    -789.1855   -205.6382
rep78*foreign
1  1    (dropped)
2  1    (dropped)
3  1     1470.257   536.9423     2.74   0.008     396.2125    2544.301
3  2    (dropped)
4  1     1270.366   504.0553     2.52   0.014     262.1051    2278.627
4  2    (dropped)
5  1    (dropped)
5  2    (dropped)
------------------------------------------------------------------------------


Unlike the first anova, the z here has only one row in the output. The underlying representation within anova is different.

Here it makes sense for predict after anova to ask for predictions when z is .43478259. As far as anova and predict are concerned, the z variable is continuous and can take on any value.

. adjust z if e(sample), by(rep for) se ci

----------------------------------------------------------------------------
Dependent variable: weight     Command: anova
Covariate set to mean: z = .43478259
----------------------------------------------------------------------------

------------------------------------------------
Repair    |
Record    |               Car type
1978      |          Domestic            Foreign
----------+-------------------------------------
1 |           3381.15
|          (382.72)
|  [2615.59,4146.7]
|
2 |           3261.84
|         (188.801)
| [2884.18,3639.49]
|
3 |           3373.34            2125.34
|         (103.704)          (307.021)
|  [3165.9,3580.78]  [1511.21,2739.48]
|
4 |           3426.49            2378.39
|         (178.887)          (183.146)
| [3068.66,3784.32]  [2012.04,2744.73]
|
5 |           2241.15            2463.41
|          (382.72)          (177.058)
|  [1475.59,3006.7]  [2109.24,2817.58]
------------------------------------------------
Key:  Linear Prediction
(Standard Error)
[95% Confidence Interval]