This FAQ is for Stata 10 and older versions of Stata. Stata 11 introduced the
margins command,
which superseded adjust.
How can I produce adjusted means after ANOVA?
| Title |
|
Producing adjusted means after ANOVA |
| Author |
Kenneth Higbee, StataCorp |
| Date |
March 2001; updated April 2005 |
Question:
Someone posed the following question:
I am running some simple ANOVAs and wanted also to produce the adjusted
means.
The command is a 3-way ANOVA with a single 2-way interaction. All the
predictors are dichotomous (0/1) variables.
There were a few problems with the output.
First, when I run
. anova opportu2 volsex frcsex volsex*frcsex q3
and then
. adjust q3 if e(sample), by(volsex frcsex) se ci
I get a table with all cells missing.
I then decided to run
. anova opportu2 volsex frcsex volsex*frcsex q3, cont(q3)
. adjust q3 if e(sample), by(volsex frcsex) se ci
This did work! What is going on?
Answer:
The result you show, comparing when q3
was used as a categorical variable and when it was specified to be a
continuous variable in the ANOVA, does not surprise me. Let me explain why
using the auto data.
. sysuse auto
(1978 Automobile Data)
. gen z = trunk < 14
. anova wei rep for rep*for z
Number of obs = 69 R-squared = 0.6079
Root MSE = 528.54 Adj R-squared = 0.5556
Source | Partial SS df MS F Prob > F
--------------+----------------------------------------------------
Model | 25984464.5 8 3248058.07 11.63 0.0000
|
rep78 | 1524294.09 4 381073.522 1.36 0.2571
foreign | 3521325.49 1 3521325.49 12.61 0.0008
rep78*foreign | 2300624.62 2 1150312.31 4.12 0.0211
z | 3248513.88 1 3248513.88 11.63 0.0012
|
Residual | 16761251.4 60 279354.19
--------------+----------------------------------------------------
Total | 42745715.9 68 628613.47
I used the 0/1 variable z as a categorical
variable in the
anova above.
Now, just like you experienced, when I use
adjust to
adjust to the MEAN of
z, I get nothing useful.
. adjust z if e(sample), by(for rep) se ci
----------------------------------------------------------------------------
Dependent variable: weight Command: anova
Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
(8 missing values generated)
(8 missing values generated)
----------------------------------------
| Repair Record 1978
Car type | 1 2 3 4 5
----------+-----------------------------
Domestic |
|
|
|
Foreign |
|
|
----------------------------------------
Key: Linear Prediction
(Standard Error)
[95% Confidence Interval]
Notice the messages about missing values generated.
Think about the parameterization used by ANOVA models. Categorical
variables enter the design matrix as a set of indicator (also called dummy)
variables. For instance, rep78, which has
five levels, becomes 5 columns in the ANOVA design matrix (one of these
levels will later be dropped in the estimation process since there are only
4 degrees of freedom for the 5 levels.
foreign becomes 2 columns in the design
matrix, and one of them will be dropped later in the estimation process. The
same is true for the z variable.
Here is a look at the underlying regression for the ANOVA above:
. regress
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 8, 60) = 11.63
Model | 25984464.5 8 3248058.07 Prob > F = 0.0000
Residual | 16761251.4 60 279354.19 R-squared = 0.6079
-------------+------------------------------ Adj R-squared = 0.5556
Total | 42745715.9 68 628613.47 Root MSE = 528.54
------------------------------------------------------------------------------
weight Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 2182.261 187.7289 11.62 0.000 1806.748 2557.775
rep78
1 1140 528.5397 2.16 0.035 82.76324 2197.237
2 1020.691 431.9311 2.36 0.021 156.7003 1884.682
3 -338.0654 352.7323 -0.96 0.342 -1043.635 367.5043
4 -85.01959 251.2557 -0.34 0.736 -587.6057 417.5666
5 (dropped)
foreign
1 -222.2614 418.2335 -0.53 0.597 -1058.853 614.3301
2 (dropped)
z
1 497.4118 145.8651 3.41 0.001 205.6382 789.1855
2 (dropped)
rep78*foreign
1 1 (dropped)
2 1 (dropped)
3 1 1470.257 536.9423 2.74 0.008 396.2125 2544.301
3 2 (dropped)
4 1 1270.366 504.0553 2.52 0.014 262.1051 2278.627
4 2 (dropped)
5 1 (dropped)
5 2 (dropped)
------------------------------------------------------------------------------
predict
produces missing values when asked to produce predictions for these 10
points. It does this because since z
entered the ANOVA model as a categorical variable with 0 and 1 as the valid
values of z, having
z = .43478259 doesn’t correspond to
either 0 or 1.
predict would, for instance, also produce a missing
value if you asked for a prediction when
rep78 = 3.257,
rep78 = 12, etc. After
anova, the only valid values for categorical
variables for predict are those values present in
the ANOVA.
Now watch what happens when I do the following
adjust:
. adjust z=0 if e(sample), by(rep for) se ci
----------------------------------------------------------------------------
Dependent variable: weight Command: anova
Covariate set to value: z = 0
----------------------------------------------------------------------------
------------------------------------------------
Repair |
Record | Car type
1978 | Domestic Foreign
----------+-------------------------------------
1 | 3597.41
| (401.19)
| [2794.91,4399.91]
|
2 | 3478.1
| (190.392)
| [3097.26,3858.94]
|
3 | 3589.6 2341.61
| (110.519) (320.272)
| [3368.53,3810.67] [1700.97,2982.25]
|
4 | 3642.76 2594.65
| (179.137) (209.548)
| [3284.43,4001.09] [2175.5,3013.81]
|
5 | 2457.41 2679.67
| (401.19) (193.923)
| [1654.91,3259.91] [2291.77,3067.58]
------------------------------------------------
Key: Linear Prediction
(Standard Error)
[95% Confidence Interval]
The answers I got are the adjusted predictions when
z is 0. I could also get predictions when
z is 1 with
. adjust z=1 if e(sample), by(rep for) se ci
output omitted
If I ask for predictions at any other values of z besides 0 or 1, I
will get missing values from the predictions.
Now, instead of having z enter the
anova model as a categorical variable, you instead
send it in as a continuous variable (a covariate in an ANCOVA).
. anova wei rep for rep*for z, cont(z)
Number of obs = 69 R-squared = 0.6079
Root MSE = 528.54 Adj R-squared = 0.5556
Source | Partial SS df MS F Prob > F
--------------+----------------------------------------------------
Model | 25984464.5 8 3248058.07 11.63 0.0000
|
rep78 | 1524294.09 4 381073.522 1.36 0.2571
foreign | 3521325.49 1 3521325.49 12.61 0.0008
rep78*foreign | 2300624.62 2 1150312.31 4.12 0.0211
z | 3248513.88 1 3248513.88 11.63 0.0012
|
Residual | 16761251.4 60 279354.19
--------------+----------------------------------------------------
Total | 42745715.9 68 628613.47
The ANOVA table looks the same, but the underlying representation is
different. There is one less column in the design matrix.
z only has one column instead of two
(corresponding to
z=0 and z=1).
Instead, since z is
“continuous”, the ANOVA is happy with whatever values might
happen to be in
z. Since z
had only two levels (0 and 1), the resulting ANOVA table is identical. This
would not be true if
z had 3 or more levels. Then the first
anova would have had more degrees of freedom for the
z, while the second
anova would continue to have only 1 degree of
freedom.
Here is the underlying regression for the ANOVA above:
. regress
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 8, 60) = 11.63
Model | 25984464.5 8 3248058.07 Prob > F = 0.0000
Residual | 16761251.4 60 279354.19 R-squared = 0.6079
-------------+------------------------------ Adj R-squared = 0.5556
Total | 42745715.9 68 628613.47 Root MSE = 528.54
------------------------------------------------------------------------------
weight Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 2679.673 193.9232 13.82 0.000 2291.769 3067.577
rep78
1 1140 528.5397 2.16 0.035 82.76324 2197.237
2 1020.691 431.9311 2.36 0.021 156.7003 1884.682
3 -338.0654 352.7323 -0.96 0.342 -1043.635 367.5043
4 -85.01959 251.2557 -0.34 0.736 -587.6057 417.5666
5 (dropped)
foreign
1 -222.2614 418.2335 -0.53 0.597 -1058.853 614.3301
2 (dropped)
z -497.4118 145.8651 -3.41 0.001 -789.1855 -205.6382
rep78*foreign
1 1 (dropped)
2 1 (dropped)
3 1 1470.257 536.9423 2.74 0.008 396.2125 2544.301
3 2 (dropped)
4 1 1270.366 504.0553 2.52 0.014 262.1051 2278.627
4 2 (dropped)
5 1 (dropped)
5 2 (dropped)
------------------------------------------------------------------------------
Unlike the first anova, the
z here has only one row in the output. The
underlying representation within anova is
different.
Here it makes sense for predict after
anova to ask for predictions when
z is .43478259. As far as
anova and predict are
concerned, the
z variable is continuous and can
take on any value.
. adjust z if e(sample), by(rep for) se ci
----------------------------------------------------------------------------
Dependent variable: weight Command: anova
Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
------------------------------------------------
Repair |
Record | Car type
1978 | Domestic Foreign
----------+-------------------------------------
1 | 3381.15
| (382.72)
| [2615.59,4146.7]
|
2 | 3261.84
| (188.801)
| [2884.18,3639.49]
|
3 | 3373.34 2125.34
| (103.704) (307.021)
| [3165.9,3580.78] [1511.21,2739.48]
|
4 | 3426.49 2378.39
| (178.887) (183.146)
| [3068.66,3784.32] [2012.04,2744.73]
|
5 | 2241.15 2463.41
| (382.72) (177.058)
| [1475.59,3006.7] [2109.24,2817.58]
------------------------------------------------
Key: Linear Prediction
(Standard Error)
[95% Confidence Interval]
|