Why do I get an error message when I try to run a repeated-measures ANOVA?
| Title |
|
Repeated-measures ANOVA examples |
| Authors |
Kenneth Higbee, StataCorp
Wesley Eddings, StataCorp |
| Date |
February 2000; updated August 2009; minor revisions July 2011 |
-
Introduction
Examples with one repeated variable
Examples with two or more repeated variables
Summary
References
Introduction
Repeated-measures ANOVA, obtained with the repeated() option of the
anova command,
requires more structural information about your model than a regular ANOVA,
as mentioned in the technical note on page 55 of [R] anova. When
this information cannot be determined from the information provided in your
anova command, you end up getting error messages such as
could not determine between-subject error term; use bse() option
r(421);
or
could not determine between-subject basic unit; use bseunit() option
r(422);
These error messages can almost always be avoided with the proper
specification of your ANOVA model.
You can jump ahead to the summary to see a list of
common user errors and how to overcome them. The examples presented here
demonstrate how to obtain a repeated-measures ANOVA and show ways to
overcome common errors.
The command wsanova, written by John Gleason and presented in article
sg103 of STB-47 (Gleason 1999), provides a different syntax for specifying
certain types of repeated-measures ANOVA designs. Not all repeated-measures
ANOVA designs are supported by wsanova, but for some problems you
might find the syntax more intuitive. (See below for
installation instructions.) In other cases, using Stata’s
anova command with the repeated() option may be the more
natural, or the only, way to obtain the analysis you seek.
The anova manual entry (see pages 51–58 of [R] anova)
presents three repeated-measures ANOVA examples. The examples range from a
simple dataset having five persons with measures on four drugs taken from
table 4.3 of Winer, Brown, and Michels (1991), to the more complicated data
from table 7.13 of Winer, Brown, and Michels (1991) involving two
repeated-measures variables (and their interactions) along with a
between-subjects term.
Gleason (1999) demonstrates the wsanova command with data from Cole
and Grizzle (1966). With these data he provides three examples that
illustrate a repeated-measures ANOVA with none, one, and two
between-subjects factors.
Here I demonstrate the anova and wsanova commands to specify
various types of repeated-measures ANOVAs. I repeat the examples from the
anova manual entry and the wsanova STB article (Gleason 1999).
A couple of other examples are also presented. Seven examples involving one
repeated variable and three examples involving two repeated variables are
shown. Along the way I comment on the common types of user mistakes made in
specifying these kinds of models and show how to overcome the difficulty.
Examples with one repeated variable
The following examples illustrate various ways repeated-measures ANOVA
models with one repeated measure variable may be specified in Stata. I
start with the simplest repeated measures design and progress through more
complicated designs. I demonstrate how to use both the anova command
and the wsanova command (when possible) and discuss potential
problems and possible solutions.
Person repeated on drug example from the anova
manual entry
The example starting on page 51 of [R] anova is taken from table 4.3
of Winer, Brown, and Michels (1991). Using
tabdisp we can
get a tabular view of the data.
. use http://www.stata-press.com/data/r12/t43
(T4.3 -- Winer, Brown, Michels)
. tabdisp person drug, cellvar(score)
----------------------------------
| drug
person | 1 2 3 4
----------+-----------------------
1 | 30 28 16 34
2 | 14 18 10 22
3 | 24 20 18 30
4 | 38 34 20 44
5 | 26 28 14 30
----------------------------------
The data are in long format.
. list, sep(4)
+-----------------------+
| person drug score |
|-----------------------|
1. | 1 1 30 |
2. | 1 2 28 |
3. | 1 3 16 |
4. | 1 4 34 |
|-----------------------|
5. | 2 1 14 |
6. | 2 2 18 |
7. | 2 3 10 |
8. | 2 4 22 |
|-----------------------|
9. | 3 1 24 |
10. | 3 2 20 |
11. | 3 3 18 |
12. | 3 4 30 |
|-----------------------|
13. | 4 1 38 |
14. | 4 2 34 |
15. | 4 3 20 |
16. | 4 4 44 |
|-----------------------|
17. | 5 1 26 |
18. | 5 2 28 |
19. | 5 3 14 |
20. | 5 4 30 |
+-----------------------+
An error users make is to try to execute the anova (or
wsanova) command with the data in wide format. For instance, if my
data looked like this
. list
+----------------------------------------+
| person drug1 drug2 drug3 drug4 |
|----------------------------------------|
1. | 1 30 28 16 34 |
2. | 2 14 18 10 22 |
3. | 3 24 20 18 30 |
4. | 4 38 34 20 44 |
5. | 5 26 28 14 30 |
+----------------------------------------+
I would not be able to run the appropriate anova command. The data
can be changed to the long format needed by anova by using the
reshape command.
. reshape long drug, i(person) j(dr)
(note: j = 1 2 3 4)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 5 -> 20
Number of variables 5 -> 3
j variable (4 values) -> dr
xij variables:
drug1 drug2 ... drug4 -> drug
-----------------------------------------------------------------------------
I would have to
rename the drug
variable score and then rename the dr variable drug to
have the same variable names shown in my earlier listing of the original
long-format dataset.
The repeated-measures anova for this example is
. anova score person drug, repeated(drug)
Number of obs = 20 R-squared = 0.9244
Root MSE = 3.06594 Adj R-squared = 0.8803
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 1379.00 7 197.00 20.96 0.0000
|
person | 680.80 4 170.20 18.11 0.0001
drug | 698.20 3 232.733333 24.76 0.0000
|
Residual | 112.80 12 9.40
-----------+----------------------------------------------------
Total | 1491.80 19 78.5157895
Between-subjects error term: person
Levels: 5 (4 df)
Lowest b.s.e. variable: person
Repeated variable: drug
Huynh-Feldt epsilon = 1.0789
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.6049
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
drug | 3 24.76 0.0000 0.0000 0.0006 0.0076
Residual | 12
-----------+----------------------------------------------------
An explanation of the output is included in the manual.
A common error that might be made when trying to run anova on this
simple example is to enter
. anova score drug, repeated(drug)
could not determine between-subject error term; use bse() option
r(421);
You might be tempted, after seeing the above error message, to type
. anova score drug, repeated(drug) bse(person)
term not in model
r(147);
but this approach also fails. The moral of this last error message is that
to perform the necessary computations for a repeated-measures ANOVA, the
between-subjects error term must be a term in the ANOVA model. Here we need
to have person as one of the terms in the model. This leads to the
correct specification anova score person drug, repeated(drug) as
shown earlier.
The wsanova command presented in STB-47 sg103 (Gleason 1999) can also
perform this analysis. To obtain this command, type net STB-47
followed by net describe sg103, and then follow the installation
instructions. See help
stb for details.
. wsanova score drug, id(person) epsilon
Number of obs = 20 R-squared = 0.9244
Root MSE = 3.06594 Adj R-squared = 0.8803
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
person | 680.8 4 170.2
drug | 698.2 3 232.733333 24.76 0.0000
Residual | 112.8 12 9.4
-----------+----------------------------------------------------
Total | 1491.8 19 78.5157895
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.6049
Huynh-Feldt (H-F) epsilon: 1.0000
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
drug | 3 24.76 0.0000 0.0006 0.0000
We get the same information we did with the anova command.
Which command to use for this simple case is a matter of
personal preference. You can either use
anova score person drug, repeated(drug)
or download wsanova and use
wsanova score drug, id(person) epsilon
No between-subjects factors example from wsanova
STB article
The examples in Gleason (1999) demonstrating the wsanova command use a
dataset obtained from Cole and Grizzle (1966). With the
net command (also see
help stb), you can
obtain the dataset, histamin.dta, as well as the wsanova
command. Type net STB-47 followed by net describe sg103,
then follow the instructions.
Gleason’s first example, a “single factor within subject
(randomized blocks) design” is the same underlying ANOVA design as
presented in the previous example. Since this
example is similar to the previous one, I simply show how you can obtain the
analysis using the anova and wsanova commands without
additional comments. The analysis using anova proceeds just as it
did with our previous example. This time, we have lhist measurements
on dogs over time. Unlike our first example, we restrict the
analysis to the first group of dogs with the if group==1
command qualifier.
. use histamin, clear
(Blood histamine levels in dogs)
. anova lhist dog time if group==1, repeated(time)
Number of obs = 16 R-squared = 0.9388
Root MSE = .409681 Adj R-squared = 0.8979
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 23.1592161 6 3.85986934 23.00 0.0001
|
dog | 16.9024081 3 5.63413604 33.57 0.0000
time | 6.25680792 3 2.08560264 12.43 0.0015
|
Residual | 1.51054662 9 .167838513
-----------+----------------------------------------------------
Total | 24.6697627 15 1.64465084
Between-subjects error term: dog
Levels: 4 (3 df)
Lowest b.s.e. variable: dog
Repeated variable: time
Huynh-Feldt epsilon = 0.5376
Greenhouse-Geisser epsilon = 0.4061
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
time | 3 12.43 0.0015 0.0138 0.0267 0.0388
Residual | 9
----------------------------------------------------------------
The same results are also easily obtained with the wsanova command.
. wsanova lhist time if group==1, id(dog) epsilon
Number of obs = 16 R-squared = 0.9388
Root MSE = .409681 Adj R-squared = 0.8979
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
dog | 16.9024081 3 5.63413604
time | 6.25680792 3 2.08560264 12.43 0.0015
Residual | 1.51054662 9 .167838513
-----------+----------------------------------------------------
Total | 24.6697627 15 1.64465084
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.4061
Huynh-Feldt (H-F) epsilon: 0.5376
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
time | 3 12.43 0.0015 0.0267 0.0138
You may use
anova lhist dog time if group==1, repeated(time)
or download wsanova and use
wsanova lhist time if group==1, id(dog) epsilon
Both commands provide the same information.
Dial calibration example from the anova manual
entry
The example starting on page 53 of [R] anova is taken from table 7.7 of
Winer, Brown, and Michels (1991). By using
tabdisp
we can get a tabular view of the data.
. use http://www.stata-press.com/data/r12/t77, clear
(T7.7 -- Winer, Brown, Michels)
. tabdisp shape subject calib, cell(score)
------------------------------------------------
| 2 methods for calibrating dials and
| subject nested in calib
4 dial | ------- 1 ------ ------- 2 ------
shapes | 1 2 3 1 2 3
----------+-------------------------------------
1 | 0 3 4 4 5 7
2 | 0 1 3 2 4 5
3 | 5 5 6 7 6 8
4 | 3 4 2 8 6 9
------------------------------------------------
I have the data in long form.
. list, sep(4)
+---------------------------------+
| calib subject shape score |
|---------------------------------|
1. | 1 1 1 0 |
2. | 1 1 2 0 |
3. | 1 1 3 5 |
4. | 1 1 4 3 |
|---------------------------------|
5. | 1 2 1 3 |
6. | 1 2 2 1 |
7. | 1 2 3 5 |
8. | 1 2 4 4 |
|---------------------------------|
9. | 1 3 1 4 |
10. | 1 3 2 3 |
11. | 1 3 3 6 |
12. | 1 3 4 2 |
|---------------------------------|
13. | 2 1 1 4 |
14. | 2 1 2 2 |
15. | 2 1 3 7 |
16. | 2 1 4 8 |
|---------------------------------|
17. | 2 2 1 5 |
18. | 2 2 2 4 |
19. | 2 2 3 6 |
20. | 2 2 4 6 |
|---------------------------------|
21. | 2 3 1 7 |
22. | 2 3 2 5 |
23. | 2 3 3 8 |
24. | 2 3 4 9 |
+---------------------------------+
If instead you had the data in a wide format, you would need to use the
reshape command
to get it into long format before using the anova (or wsanova)
command. For an example of using reshape, see the
first example.
You should understand your model before attempting to use anova. For
this dataset, both calib and shape are fixed while
subject is random. The full model includes terms for calib,
subject nested within calib, shape, shape
interacted with calib, and shape interacted with
subject nested within calib. As usual, we let this highest
order term drop and become the residual error. The shape variable is
the repeated variable. This produces an ANOVA with one between-subjects
factor (same underlying design as the next example).
If you were to examine the expected mean squares for this setup (Winer,
Brown, and Michels 1991), you would find the appropriate error term for
the test of calib is subject|calib. The appropriate error
term for shape and shape#calib is shape#subject|calib
(which is the residual error since we do not include the term in the model).
Armed with this information, it becomes easy to specify the correct
anova command.
. anova score calib / subject|calib shape calib#shape, repeated(shape)
Number of obs = 24 R-squared = 0.8925
Root MSE = 1.11181 Adj R-squared = 0.7939
Source | Partial SS df MS F Prob > F
--------------+----------------------------------------------------
Model | 123.125 11 11.1931818 9.06 0.0003
|
calib | 51.0416667 1 51.0416667 11.89 0.0261
subject|calib | 17.1666667 4 4.29166667
--------------+----------------------------------------------------
shape | 47.4583333 3 15.8194444 12.80 0.0005
calib#shape | 7.45833333 3 2.48611111 2.01 0.1662
|
Residual | 14.8333333 12 1.23611111
--------------+----------------------------------------------------
Total | 137.958333 23 5.99818841
Between-subjects error term: subject|calib
Levels: 6 (4 df)
Lowest b.s.e. variable: subject
Covariance pooled over: calib (for repeated variable)
Repeated variable: shape
Huynh-Feldt epsilon = 0.8483
Greenhouse-Geisser epsilon = 0.4751
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
--------------+----------------------------------------------------
shape | 3 12.80 0.0005 0.0011 0.0099 0.0232
calib#shape | 3 2.01 0.1662 0.1791 0.2152 0.2291
Residual | 12
-------------------------------------------------------------------
A common error when unfamiliar with the underlying model is to just list
some variables in the anova command (possibly with some interactions
included), and then get the following error message.
. anova score calib subject shape calib#shape, repeated(shape)
could not determine between-subject error term; use bse() option
r(421);
Stata’s anova command needs the between-subject error term
(here subject|calib) to be included in the model to obtain the
repeated-measures corrections.
The wsanova command (Gleason 1999) seems like a natural alternative
to use for this example. It seems you should be able to say
. wsanova score shape, id(subject) between(calib) epsilon
epsilon option is invalid with missing data
r(499);
but something went wrong. This dataset has no missing observations. This
is just wsanova’s way of saying it is confused. What
could have caused the confusion? Look at the listing of the data near the
beginning of this example. In particular, pay attention to how the
subject variable is set up. We have subjects going from 1 to
3 for the first level of calib and then going from 1 to 3 again for
the second level of calib. anova was able to handle this, but
wsanova is confused. We can help wsanova out of its confusion
by generating a new variable that gives a unique number to each subject
regardless of which level of calib is involved. We use the
group() function of the
egen command to help us.
. egen z = group(calib subject)
. wsanova score shape, id(z) between(calib) epsilon
Number of obs = 24 R-squared = 0.8925
Root MSE = 1.11181 Adj R-squared = 0.7939
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Between subjects: | 51.0416667 1 51.0416667 11.89 0.0261
calib | 51.0416667 1 51.0416667 11.89 0.0261
z*calib | 17.1666667 4 4.29166667
|
Within subjects: | 54.9166667 6 9.15277778 7.40 0.0017
shape | 47.4583333 3 15.8194444 12.80 0.0005
shape*calib | 7.45833333 3 2.48611111 2.01 0.1662
Residual | 14.8333333 12 1.23611111
-----------+----------------------------------------------------
Total | 137.958333 23 5.99818841
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.4751
Huynh-Feldt (H-F) epsilon: 0.8483
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
shape | 3 12.80 0.0005 0.0099 0.0011
shape*calib | 3 2.01 0.1662 0.2152 0.1791
We have been able to reproduce the same results we obtained with
anova. There is one test provided in the output of wsanova
above that is not automatically produced with anova. If you look
back at the ANOVA table produced by wsanova, you will see it
produces an overall test for “Within subjects”. Here it
produces an F of 7.40.
Within subjects: | 54.9166667 6 9.15277778 7.40 0.0017
Using the test command we can easily obtain this same test after
running anova.
. test shape calib#shape
Source | Partial SS df MS F Prob > F
------------------+----------------------------------------------------
shape calib#shape | 54.9166667 6 9.15277778 7.40 0.0017
Residual | 14.8333333 12 1.23611111
With this example you can either do
anova score calib / subject|calib shape calib#shape , repeated(shape)
or download wsanova (see above for
installation instructions) and do
egen z = group(calib subject)
wsanova score shape, id(z) between(calib) epsilon
Both provide the same information.
One between-subjects factor example from
wsanova STB article
The examples in Gleason (1999) demonstrating the wsanova command use a
dataset obtained from Cole and Grizzle (1966). With the
net command (also see
help stb) you can obtain the dataset,
histamin.dta, as well as the wsanova command (type
net STB-47 followed by net describe sg103, and then follow the
instructions). Gleason’s second example, a one between-subjects factor ANOVA
design, is the same underlying ANOVA design presented in the
previous example.
Since this example is similar to the previous one, I simply show how you can
obtain the analysis using the anova and wsanova commands
without additional comments. The analysis using anova proceeds just
as it did with our previous example. This time we have lhist
measurements on dogs nested within groups over time.
Following the lead of Gleason (1999) we restrict the data with the if dog
!= 6 command qualifier.
. use histamin, clear
(Blood histamine levels in dogs)
. anova lhist group / dog|group time time#group if dog!=6, repeated(time)
Number of obs = 60 R-squared = 0.9709
Root MSE = .27427 Adj R-squared = 0.9479
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 82.6836382 26 3.18013993 42.28 0.0000
|
group | 27.0286268 3 9.00954226 4.07 0.0359
dog|group | 24.3468341 11 2.21334855
-----------+----------------------------------------------------
time | 12.0589871 3 4.01966235 53.44 0.0000
time#group | 17.5232918 9 1.94703243 25.88 0.0000
|
Residual | 2.48238892 33 .075223907
-----------+----------------------------------------------------
Total | 85.1660271 59 1.44349199
Between-subjects error term: dog|group
Levels: 15 (11 df)
Lowest b.s.e. variable: dog
Covariance pooled over: group (for repeated variable)
Repeated variable: time
Huynh-Feldt epsilon = 0.8475
Greenhouse-Geisser epsilon = 0.5694
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
time | 3 53.44 0.0000 0.0000 0.0000 0.0000
time#group | 9 25.88 0.0000 0.0000 0.0000 0.0000
Residual | 33
----------------------------------------------------------------
I can obtain the overall within-subjects test as follows:
. test time time#group
Source | Partial SS df MS F Prob > F
----------------+----------------------------------------------------
time time#group | 31.3081774 12 2.60901478 34.68 0.0000
Residual | 2.48238892 33 .075223907
This same analysis is also easy with wsanova:
. wsanova lhist time if dog!=6, id(dog) between(group) epsilon
Number of obs = 60 R-squared = 0.9709
Root MSE = .27427 Adj R-squared = 0.9479
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Between subjects: | 27.0286268 3 9.00954226 4.07 0.0359
group | 27.0286268 3 9.00954226 4.07 0.0359
dog*group | 24.3468341 11 2.21334855
|
Within subjects: | 31.3081774 12 2.60901478 34.68 0.0000
time | 12.0589871 3 4.01966235 53.44 0.0000
time*group | 17.5232918 9 1.94703243 25.88 0.0000
Residual | 2.48238892 33 .075223907
-----------+----------------------------------------------------
Total | 85.1660271 59 1.44349199
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.5694
Huynh-Feldt (H-F) epsilon: 0.8475
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
time | 3 53.44 0.0000 0.0000 0.0000
time*group | 9 25.88 0.0000 0.0000 0.0000
This example has the dogs numbered from 1 to 16, so (unlike the
previous example) there is no need to generate a new id() variable
for the wsanova command.
For this example, you can pick between running
anova lhist group / dog|group time time#group if dog != 6, repeated(time)
and downloading wsanova and running
wsanova lhist time if dog != 6, id(dog) between(group) epsilon
to obtain the results.
Two between-subjects factors example from
wsanova STB article
The third example in Gleason (1999) demonstrating the wsanova command
also uses the histamin.dta dataset obtained from Cole and Grizzle
(1966). This example expands from the previous
example by splitting the group variable, which has four levels,
into two variables, depleted and drug, each with two levels
corresponding to a 2 × 2 factorial. We end up having two
between-subject factors plus their interaction. Again, following the lead
of Gleason (1999), we restrict the data with the if dog != 6 command
qualifier.
Here is the result of running wsanova on this dataset:
. use histamin, clear
(Blood histamine levels in dogs)
. wsanova lhist time if dog!=6, id(dog) between(drug depl drug*depl) eps
Number of obs = 60 R-squared = 0.9709
Root MSE = .27427 Adj R-squared = 0.9479
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Between subjects: | 27.0286268 3 9.00954226 4.07 0.0359
drug | 5.99336256 1 5.99336256 2.71 0.1281
depleted | 15.4484076 1 15.4484076 6.98 0.0229
drug*depleted | 4.69087549 1 4.69087549 2.12 0.1734
dog*drug*depleted | 24.3468341 11 2.21334855
|
Within subjects: | 31.3081774 12 2.60901478 34.68 0.0000
time | 12.0589871 3 4.01966235 53.44 0.0000
time*drug | 1.84429539 3 .614765129 8.17 0.0003
time*depleted | 12.0897855 3 4.02992849 53.57 0.0000
time*drug*depleted | 2.93077944 3 .976926479 12.99 0.0000
Residual | 2.48238892 33 .075223907
-----------+----------------------------------------------------
Total | 85.1660271 59 1.44349199
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.5694
Huynh-Feldt (H-F) epsilon: 0.8475
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
time | 3 53.44 0.0000 0.0000 0.0000
time*drug | 3 8.17 0.0003 0.0039 0.0008
time*depleted | 3 53.57 0.0000 0.0000 0.0000
time*drug*depleted | 3 12.99 0.0000 0.0005 0.0000
The anova command with the repeated() option can also be used
on this problem:
. anova lhist drug dep drug#dep / dog|drug#dep time time#drug time#dep time#drug#dep
> if dog!=6, rep(time)
Number of obs = 60 R-squared = 0.9709
Root MSE = .27427 Adj R-squared = 0.9479
Source | Partial SS df MS F Prob > F
-------------------+----------------------------------------------------
Model | 82.6836382 26 3.18013993 42.28 0.0000
|
drug | 6.1513201 1 6.1513201 2.78 0.1237
depleted | 15.712679 1 15.712679 7.10 0.0220
drug#depleted | 4.69087549 1 4.69087549 2.12 0.1734
dog|drug#depleted | 24.3468341 11 2.21334855
-------------------+----------------------------------------------------
time | 12.0589871 3 4.01966235 53.44 0.0000
time#drug | 1.84429539 3 .614765129 8.17 0.0003
time#depleted | 12.0897855 3 4.02992849 53.57 0.0000
time#drug#depleted | 2.93077944 3 .976926479 12.99 0.0000
|
Residual | 2.48238892 33 .075223907
-------------------+----------------------------------------------------
Total | 85.1660271 59 1.44349199
Between-subjects error term: dog|drug#depleted
Levels: 15 (11 df)
Lowest b.s.e. variable: dog
Covariance pooled over: drug#depleted (for repeated variable)
Repeated variable: time
Huynh-Feldt epsilon = 0.8475
Greenhouse-Geisser epsilon = 0.5694
Box’s conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-------------------+----------------------------------------------------
time | 3 53.44 0.0000 0.0000 0.0000 0.0000
time#drug | 3 8.17 0.0003 0.0008 0.0039 0.0156
time#depleted | 3 53.57 0.0000 0.0000 0.0000 0.0000
time#drug#depleted | 3 12.99 0.0000 0.0000 0.0005 0.0041
Residual | 33
------------------------------------------------------------------------
If you look closely, you will find a difference in the results for the
drug and the depleted terms between anova and
wsanova. This is due to the imbalance in the data from excluding the
observations associated with the sixth dog.
. tabulate drug depleted if dog!=6
Drug | Depleted pre-test
administer | histamines?
ed | No Yes | Total
-----------+----------------------+----------
Morphine | 16 12 | 28
TriMeth | 16 16 | 32
-----------+----------------------+----------
Total | 32 28 | 60
The wsanova command actually performs its work with two separate
calls to anova instead of getting the whole ANOVA table at one time.
The anova command with the repeated() option computes the
complete model in one estimation. In the presence of imbalanced data, this
method can sometimes make a difference in the results. In these cases, I
recommend using the anova command.
Gleason (1999) also shows for this example how to use the wonly()
option in conjunction with the between() option of wsanova to
control which terms end up in the ANOVA table.
. wsanova lhist time if dog!=6, id(dog) between(drug depl) wonly(time time*depl) epsilon
Number of obs = 60 R-squared = 0.9103
Root MSE = .442692 Adj R-squared = 0.8642
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Between subjects: | 22.3377513 2 11.1688756 4.62 0.0326
drug | 6.87754936 1 6.87754936 2.84 0.1176
depleted | 16.8857304 1 16.8857304 6.98 0.0215
dog*drug*depleted | 29.0377096 12 2.41980913
|
Within subjects: | 26.1474934 6 4.35791556 22.24 0.0000
time | 12.0454347 3 4.0151449 20.49 0.0000
time*depleted | 12.3626079 3 4.12086929 21.03 0.0000
Residual | 7.64307289 39 .195976228
-----------+----------------------------------------------------
Total | 85.1660271 59 1.44349199
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.5694
Huynh-Feldt (H-F) epsilon: 0.7651
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
time | 3 20.49 0.0000 0.0000 0.0000
time*depleted | 3 21.03 0.0000 0.0000 0.0000
We can, of course, obtain the same results directly with anova.
. anova lhist drug depl / dog|drug#depl time time#depl if dog!=6, rep(time)
Number of obs = 60 R-squared = 0.9103
Root MSE = .442692 Adj R-squared = 0.8642
Source | Partial SS df MS F Prob > F
------------------+----------------------------------------------------
Model | 77.5229542 20 3.87614771 19.78 0.0000
|
drug | 6.87754936 1 6.87754936 2.84 0.1176
depleted | 16.8857304 1 16.8857304 6.98 0.0215
dog|drug#depleted | 29.0377096 12 2.41980913
------------------+----------------------------------------------------
time | 12.0454347 3 4.0151449 20.49 0.0000
time#depleted | 12.3626079 3 4.12086929 21.03 0.0000
|
Residual | 7.64307289 39 .195976228
------------------+----------------------------------------------------
Total | 85.1660271 59 1.44349199
Between-subjects error term: dog|drug#depleted
Levels: 15 (12 df)
Lowest b.s.e. variable: dog
Covariance pooled over: drug#depleted (for repeated variable)
Repeated variable: time
Huynh-Feldt epsilon = 0.7651
Greenhouse-Geisser epsilon = 0.5694
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
------------------+----------------------------------------------------
time | 3 20.49 0.0000 0.0000 0.0000 0.0006
time#depleted | 3 21.03 0.0000 0.0000 0.0000 0.0005
Residual | 39
-----------------------------------------------------------------------
As with the previous examples, it is important to understand your model and
to make sure to include the between-subjects error term in the model. Here
it is the term dog|drug#depleted. The wsanova command puts
this term (labeled as dog*drug*depleted) into the model automatically based
on the options you specify.
This example does point out that for models with imbalance there can
sometimes be a difference between wsanova and anova in the
reported ANOVA table for some of the terms. In these cases, you should rely
on the anova command.
Another two between-subjects factors example
This example is taken from the data of table 7.22 of Winer, Brown, and
Michels (1991) and has a similar underlying structure to that of the
previous example.
For this example, we have an experiment on a learning task with the
variables anxiety and tension, each at two levels in a
factorial layout. Nested within this interaction is subject. These
are the variables involved in the between-subjects portion of our ANOVA.
There are four trials—our repeated variable. We are also
interested in examining the interaction of trial with the other terms
in the model.
Here is a tabular view of the data:
. use t722, clear
(T7.22 -- Winer, Brown, Michels)
. tabdisp subject trial, by(anxiety tension) c(response) concise stubw(10)
-----------------------------------
effect of |
anxiety -- |
2 levels, |
muscular |
tension -- |
2 levels |
and | trial
subject | 1 2 3 4
-----------+-----------------------
1 |
1 |
1 | 18 14 12 6
2 | 19 12 8 4
3 | 14 10 6 2
-----------+-----------------------
1 |
2 |
4 | 16 12 10 4
5 | 12 8 6 2
6 | 18 10 5 1
-----------+-----------------------
2 |
1 |
7 | 16 10 8 4
8 | 18 8 4 1
9 | 16 12 6 2
-----------+-----------------------
2 |
2 |
10 | 19 16 10 8
11 | 16 14 10 9
12 | 16 12 8 8
-----------------------------------
In the following anova command, I take advantage of Stata’s
ability to allow abbreviations for the variable names.
. anova response an te an#te / su|an#te tr an#tr te#tr an#te#tr, rep(tr)
Number of obs = 48 R-squared = 0.9585
Root MSE = 1.47432 Adj R-squared = 0.9188
Source | Partial SS df MS F Prob > F
------------------------+----------------------------------------------------
Model | 1205.83333 23 52.4275362 24.12 0.0000
|
anxiety | 10.0833333 1 10.0833333 0.98 0.3517
tension | 8.33333333 1 8.33333333 0.81 0.3949
anxiety#tension | 80.0833333 1 80.0833333 7.77 0.0237
subject|anxiety#tension | 82.5 8 10.3125
------------------------+----------------------------------------------------
trial | 991.5 3 330.5 152.05 0.0000
anxiety#trial | 8.41666667 3 2.80555556 1.29 0.3003
tension#trial | 12.1666667 3 4.05555556 1.87 0.1624
anxiety#tension#trial | 12.75 3 4.25 1.96 0.1477
|
Residual | 52.1666667 24 2.17361111
------------------------+----------------------------------------------------
Total | 1258 47 26.7659574
Between-subjects error term: subject|anxiety#tension
Levels: 12 (8 df)
Lowest b.s.e. variable: subject
Covariance pooled over: anxiety#tension (for repeated variable)
Repeated variable: trial
Huynh-Feldt epsilon = 0.9023
Greenhouse-Geisser epsilon = 0.5361
Box's conservative epsilon = 0.3333
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
------------------------+----------------------------------------------------
trial | 3 152.05 0.0000 0.0000 0.0000 0.0000
anxiety#trial | 3 1.29 0.3003 0.3015 0.3002 0.2888
tension#trial | 3 1.87 0.1624 0.1693 0.1967 0.2091
anxiety#tension#trial | 3 1.96 0.1477 0.1550 0.1847 0.1996
Residual | 24
-----------------------------------------------------------------------------
The wsanova command (Gleason 1999) can also be used for this example.
. wsanova response trial, id(subject) between(anx tens anx*tens) epsilon
Number of obs = 48 R-squared = 0.9585
Root MSE = 1.47432 Adj R-squared = 0.9188
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Between subjects: | 98.5 3 32.8333333 3.18 0.0845
anxiety | 10.0833333 1 10.0833333 0.98 0.3517
tension | 8.33333333 1 8.33333333 0.81 0.3949
anxiety*tension | 80.0833333 1 80.0833333 7.77 0.0237
subject*anxiety*tension | 82.5 8 10.3125
|
Within subjects: | 1024.83333 12 85.4027778 39.29 0.0000
trial | 991.5 3 330.5 152.05 0.0000
trial*anxiety | 8.41666667 3 2.80555556 1.29 0.3003
trial*tension | 12.1666667 3 4.05555556 1.87 0.1624
trial*anxiety*tension | 12.75 3 4.25 1.96 0.1477
Residual | 52.1666667 24 2.17361111
-----------+----------------------------------------------------
Total | 1258 47 26.7659574
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.5361
Huynh-Feldt (H-F) epsilon: 0.9023
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
trial | 3 152.05 0.0000 0.0000 0.0000
trial*anxiety | 3 1.29 0.3003 0.3002 0.3015
trial*tension | 3 1.87 0.1624 0.1967 0.1693
trial*anxiety*tension | 3 1.96 0.1477 0.1847 0.1550
You can choose between
anova response an te an#te / su|an#te tr an#tr te#tr an#te#tr , rep(tr)
or download wsanova (see above for
installation instructions) and then type
wsanova response trial, id(subject) between(anx tens anx*tens) epsilon
A complicated design with one repeated variable
Table 9–11 of Myers (1966) presents an interesting dataset with factor
A having two levels, G (representing groups) nested within
A (a total of four groups), factor B with two levels that is
crossed with A and G|A, S (representing subjects)
nested with all of this (S|B#G|A) for a total of 16 subjects,
then factor C, the repeated-measures variable with three levels.
Each of the 16 subjects has measures for the three levels of C. The
interaction of C with the other terms is also included in the model.
Here is a look at the data:
. use tm911, clear
(Table 9-11 in Myers)
. list, sep(12)
+--------------------------+
| A G B S C res |
|--------------------------|
1. | 1 1 1 1 1 4 |
2. | 1 1 1 1 2 5 |
3. | 1 1 1 1 3 8 |
4. | 1 1 1 2 1 3 |
5. | 1 1 1 2 2 6 |
6. | 1 1 1 2 3 10 |
7. | 1 1 2 3 1 3 |
8. | 1 1 2 3 2 6 |
9. | 1 1 2 3 3 10 |
10. | 1 1 2 4 1 4 |
11. | 1 1 2 4 2 5 |
12. | 1 1 2 4 3 9 |
|--------------------------|
13. | 1 2 1 5 1 4 |
14. | 1 2 1 5 2 7 |
15. | 1 2 1 5 3 8 |
16. | 1 2 1 6 1 3 |
17. | 1 2 1 6 2 6 |
18. | 1 2 1 6 3 9 |
19. | 1 2 2 7 1 1 |
20. | 1 2 2 7 2 6 |
21. | 1 2 2 7 3 8 |
22. | 1 2 2 8 1 4 |
23. | 1 2 2 8 2 2 |
24. | 1 2 2 8 3 12 |
|--------------------------|
25. | 2 3 1 9 1 7 |
26. | 2 3 1 9 2 7 |
27. | 2 3 1 9 3 11 |
28. | 2 3 1 10 1 4 |
29. | 2 3 1 10 2 8 |
30. | 2 3 1 10 3 14 |
31. | 2 3 2 11 1 9 |
32. | 2 3 2 11 2 8 |
33. | 2 3 2 11 3 16 |
34. | 2 3 2 12 1 7 |
35. | 2 3 2 12 2 10 |
36. | 2 3 2 12 3 19 |
|--------------------------|
37. | 2 4 1 13 1 3 |
38. | 2 4 1 13 2 5 |
39. | 2 4 1 13 3 9 |
40. | 2 4 1 14 1 2 |
41. | 2 4 1 14 2 7 |
42. | 2 4 1 14 3 8 |
43. | 2 4 2 15 1 10 |
44. | 2 4 2 15 2 12 |
45. | 2 4 2 15 3 13 |
46. | 2 4 2 16 1 9 |
47. | 2 4 2 16 2 11 |
48. | 2 4 2 16 3 15 |
+--------------------------+
Myers (1966) indicates that for this example the ANOVA table should have the
following structure:
Model Term | F-Test
-----------------+-----------------------------
Between S |
Between G |
A | MS(A) / MS(G|A)
G|A |
Within G |
B | MS(B) / MS(B#G|A)
B#A | MS(B#A) / MS(B#G|A)
B#G|A | MS(B#G|A) / MS(S|B#G|A)
S|B#G|A |
Within S |
C | MS(C) / MS(C#G|A)
C#A | MS(C#A) / MS(C#G|A)
C#G|A | MS(C#G|A) / MS(C#B#G|A)
C#B | MS(C#B) / MS(C#B#G|A)
C#B#A | MS(C#B#A) / MS(C#B#G|A)
C#B#G|A | MS(C#B#G|A) / MS(C#S|B#G|A)
C#S|B#G|A |
-----------------+-----------------------------
How did Myers (1966) determine the appropriate mean square to use in the
denominator of each of the F tests listed above? He first determined which factors were fixed and which were random and which factors were nested and
which were crossed. Then, from that, he figured the expected mean squares
for each term. From these he could see which terms were the appropriate
error terms for other terms in the model. See Winer, Brown, and Michels
(1991) or some other good book on ANOVA modeling to understand “fixed
factors”, “random factors”, “nesting”,
“crossing”, “expected mean squares”, etc.
The anova command allows the “/” notation that
indicates the terms to the left of the slash are to be tested using the
term to the right of the slash as the error term. This method makes it easy
to get all but one of the F tests from the complicated ANOVA table
above with one call to anova. The remaining F test (the test
for the C#G|A term) is easily obtained with a call to the test
command after running anova. Again, I drop the largest possible
interaction term (C#S|B#G|A) so that the residual (which would have
had zero degrees of freedom if the term were left in the model) becomes that
interaction term.
. anova res A / G|A B B#A / B#G|A / S|B#G|A C C#A / C#G|A C#B C#B#A / C#B#G|A / , rep(C)
Number of obs = 48 R-squared = 0.9346
Root MSE = 1.70171 Adj R-squared = 0.8080
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 662.645833 31 21.375672 7.38 0.0001
|
A | 136.6875 1 136.6875 24.76 0.0381
G|A | 11.0416667 2 5.52083333
-----------+----------------------------------------------------
B | 54.1875 1 54.1875 7.45 0.1121
B#A | 67.6875 1 67.6875 9.31 0.0927
B#G|A | 14.5416667 2 7.27083333
-----------+----------------------------------------------------
B#G|A | 14.5416667 2 7.27083333 13.96 0.0025
S|B#G|A | 4.16666667 8 .520833333
-----------+----------------------------------------------------
C | 337.166667 2 168.583333 34.88 0.0029
C#A | 1.5 2 .75 0.16 0.8612
C#G|A | 19.3333333 4 4.83333333
-----------+----------------------------------------------------
C#B | 8 2 4 2.04 0.2448
C#B#A | .5 2 .25 0.13 0.8836
C#B#G|A | 7.83333333 4 1.95833333
-----------+----------------------------------------------------
C#B#G|A | 7.83333333 4 1.95833333 0.68 0.6182
|
Residual | 46.3333333 16 2.89583333
-----------+----------------------------------------------------
Total | 708.979167 47 15.0846631
Between-subjects error term: S|B#G|A
Levels: 16 (8 df)
Lowest b.s.e. variable: S
Covariance pooled over: B#G|A (for repeated variable)
Repeated variable: C
Huynh-Feldt epsilon = 2.4863
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.9961
Box's conservative epsilon = 0.5000
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
C | 2 34.88 0.0029 0.0029 0.0030 0.0275
C#A | 2 0.16 0.8612 0.8612 0.8605 0.7317
C#G|A | 4
-----------+----------------------------------------------------
C#B | 2 2.04 0.2448 0.2448 0.2451 0.2892
C#B#A | 2 0.13 0.8836 0.8836 0.8830 0.7551
C#B#G|A | 4
-----------+----------------------------------------------------
C#B#G|A | 4 0.68 0.6182 0.6182 0.6177 0.5354
Residual | 16
----------------------------------------------------------------
. test C#G|A / C#B#G|A
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
C#G|A | 19.3333333 4 4.83333333 2.47 0.2015
C#B#G|A | 7.83333333 4 1.95833333
The wsanova command (Gleason 1999) can produce the appropriate mean
squares for the terms in the model but will not be able to automatically
create the correct F tests for most of the terms. It does not
understand all of the structure of this complicated model. Here is what you
can obtain from wsanova:
. wsanova res C, id(S) between(A G*A B B*A B*G*A) epsilon
Number of obs = 48 R-squared = 0.9346
Root MSE = 1.70171 Adj R-squared = 0.8080
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Between subjects: | 284.145833 7 40.5922619 77.94 0.0000
A | 136.6875 1 136.6875 262.44 0.0000
G*A | 11.0416667 2 5.52083333 10.60 0.0056
B | 54.1875 1 54.1875 104.04 0.0000
B*A | 67.6875 1 67.6875 129.96 0.0000
B*G*A | 14.5416667 2 7.27083333 13.96 0.0025
S*A*G*B | 4.16666667 8 .520833333
|
Within subjects: | 374.333333 16 23.3958333 8.08 0.0001
C | 337.166667 2 168.583333 58.22 0.0000
C*A | 1.5 2 .75 0.26 0.7750
C*G*A | 19.3333333 4 4.83333333 1.67 0.2061
C*B | 8 2 4 1.38 0.2797
C*B*A | .5 2 .25 0.09 0.9177
C*B*G*A | 7.83333333 4 1.95833333 0.68 0.6182
Residual | 46.3333333 16 2.89583333
-----------+----------------------------------------------------
Total | 708.979167 47 15.0846631
Note: Within subjects F-test(s) above assume sphericity of residuals;
p-values corrected for lack of sphericity appear below.
Greenhouse-Geisser (G-G) epsilon: 0.9961
Huynh-Feldt (H-F) epsilon: 1.0000
Sphericity G-G H-F
Source | df F Prob > F Prob > F Prob > F
-----------+----------------------------------------------------
C | 2 58.22 0.0000 0.0000 0.0000
C*A | 2 0.26 0.7750 0.7742 0.7750
C*G*A | 4 1.67 0.2061 0.2064 0.2061
C*B | 2 1.38 0.2797 0.2797 0.2797
C*B*A | 2 0.09 0.9177 0.9171 0.9177
C*B*G*A | 4 0.68 0.6182 0.6177 0.6182
Remember that for this complicated ANOVA you should ignore most of the
F tests produced in the output from the wsanova command.
Instead, you need to produce the correct F tests from the
mean-squares in the ANOVA table after running wsanova. Using the
anova command and taking advantage of the “/”
notation gives you the appropriate F tests directly in the ANOVA
table.
If you did not understand the underlying model for this example and just
tried entering variable names into the anova command hoping
something good would come out, you would most likely be disappointed. While
understanding the underlying model is helpful with simple problems, it
becomes crucial with more complicated designs.
Examples with two or more repeated variables
Shown below are three examples of repeated-measures ANOVAs where the
subjects have repeated observations over more than one variable. Unlike the
previous section of this document where I outlined the use of both
anova and wsanova (Gleason 1999), with more than one
repeated-measures variable, the anova command is the only choice.
No between-subjects factors with two repeated
variables
This example is obtained by restricting our attention of the data from the
next example to only one level of the
between-subjects variable. This choice produces an example with no
between-subjects factors and two repeated variables. The data come from
table 7.13 of Winer, Brown, and Michels (1991). After keeping only those
observations of interest to this example, we have three subjects, each with
nine accuracy scores on all combinations of the three different dials and
three different periods. With subject a random factor and both
dial and period fixed factors, the appropriate error term for
the test of dial is the dial#subject interaction. Likewise,
period#subject is the correct error term for period, and
period#dial#subject (which we will drop so that it becomes residual
error) is the appropriate error term for period#dial.
Here are the data:
. use http://www.stata-press.com/data/r12/t713, clear
(T7.13 -- Winer, Brown, Michels)
. keep if noise==1
(27 observations deleted)
. drop noise
. label var subject ""
. tabdisp subject dial period, cell(score)
--------------------------------------------------------------------
| 10 minute time periods and dial
| ------- 1 ------ ------- 2 ------ ------- 3 ------
subject | 1 2 3 1 2 3 1 2 3
----------+---------------------------------------------------------
1 | 45 53 60 40 52 57 28 37 46
2 | 35 41 50 30 37 47 25 32 41
3 | 60 65 75 58 54 70 40 47 50
--------------------------------------------------------------------
By specifying both the period and dial variables in the
repeated() option of anova along with appropriate use of the
“/” notation for specifying the proper error terms in the
model, we can easily obtain the desired ANOVA table.
. anova score subject period / subject#period dial / subject#dial period#dial, repeated(peri
> od dial)
Number of obs = 27 R-squared = 0.9871
Root MSE = 2.60342 Adj R-squared = 0.9580
Source | Partial SS df MS F Prob > F
---------------+----------------------------------------------------
Model | 4146.44444 18 230.358025 33.99 0.0000
|
subject | 1828.22222 2 914.111111 29.54 0.0040
period | 1124.66667 2 562.333333 18.17 0.0098
subject#period | 123.777778 4 30.9444444
---------------+----------------------------------------------------
dial | 1020.66667 2 510.333333 51.32 0.0014
subject#dial | 39.7777778 4 9.94444444
---------------+----------------------------------------------------
period#dial | 9.33333333 4 2.33333333 0.34 0.8410
|
Residual | 54.2222222 8 6.77777778
---------------+----------------------------------------------------
Total | 4200.66667 26 161.564103
Between-subjects error term: subject
Levels: 3 (2 df)
Lowest b.s.e. variable: subject
Repeated variable: period
Huynh-Feldt epsilon = 0.6829
Greenhouse-Geisser epsilon = 0.5419
Box's conservative epsilon = 0.5000
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
---------------+----------------------------------------------------
period | 2 18.17 0.0098 0.0275 0.0441 0.0509
subject#period | 4
--------------------------------------------------------------------
Repeated variable: dial
Huynh-Feldt epsilon = 0.7129
Greenhouse-Geisser epsilon = 0.5481
Box's conservative epsilon = 0.5000
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
---------------+----------------------------------------------------
dial | 2 51.32 0.0014 0.0062 0.0147 0.0189
subject#dial | 4
--------------------------------------------------------------------
Repeated variables: period#dial
Huynh-Feldt epsilon = 0.2631
Greenhouse-Geisser epsilon = 0.2532
Box's conservative epsilon = 0.2500
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
---------------+----------------------------------------------------
period#dial | 4 0.34 0.8410 0.6246 0.6187 0.6168
Residual | 8
--------------------------------------------------------------------
The test on subject in the main ANOVA table should be ignored.
With multiple repeated variables we obtain the various epsilon corrections
(Greenhouse–Geisser, Huynh–Feldt, Box’s conservative
epsilon) to the p-values for each repeated variable and each
interaction of those repeated variables.
One between-subjects factor with two
repeated-variables example from the anova manual entry
This example can be found starting on page 56 of [R] anova. The data
are from table 7.13 of Winer, Brown, and Michels (1991). There is one
between-subject factor, noise, with two levels. There are three
subjects nested within each level of noise. As with the previous example, there are two repeated variables,
period and dial, each with three levels, so that each subject
has nine values recorded. Details of this dataset and the underlying model
can be found in [R] anova and in Winer, Brown, and Michels (1991).
Here are the data:
. use http://www.stata-press.com/data/r12/t713, clear
(T7.13 -- Winer, Brown, Michels)
. tabdisp subject dial period, by(noise) cell(score) stubwidth(11)
----------------------------------------------------------------------
noise |
background |
and subject | 10 minute time periods and dial
nested in | ------- 1 ------ ------- 2 ------ ------- 3 ------
noise | 1 2 3 1 2 3 1 2 3
------------+---------------------------------------------------------
1 |
1 | 45 53 60 40 52 57 28 37 46
2 | 35 41 50 30 37 47 25 32 41
3 | 60 65 75 58 54 70 40 47 50
------------+---------------------------------------------------------
2 |
1 | 50 48 61 25 34 51 16 23 35
2 | 42 45 55 30 37 43 22 27 37
3 | 56 60 77 40 39 57 31 29 46
----------------------------------------------------------------------
Here are the ANOVA results for these data:
. anova score noise / subject|noise period noise#period / period#subject|noise dial
> noise#dial / dial#subject|noise period#dial noise#period#dial, repeated(period dial)
Number of obs = 54 R-squared = 0.9872
Root MSE = 2.81859 Adj R-squared = 0.9576
Source | Partial SS df MS F Prob > F
---------------------+----------------------------------------------------
Model | 9797.72222 37 264.803303 33.33 0.0000
|
noise | 468.166667 1 468.166667 0.75 0.4348
subject|noise | 2491.11111 4 622.777778
---------------------+----------------------------------------------------
period | 3722.33333 2 1861.16667 63.39 0.0000
noise#period | 333 2 166.5 5.67 0.0293
period#subject|noise | 234.888889 8 29.3611111
---------------------+----------------------------------------------------
dial | 2370.33333 2 1185.16667 89.82 0.0000
noise#dial | 50.3333333 2 25.1666667 1.91 0.2102
dial#subject|noise | 105.555556 8 13.1944444
---------------------+----------------------------------------------------
period#dial | 10.6666667 4 2.66666667 0.34 0.8499
noise#period#dial | 11.3333333 4 2.83333333 0.36 0.8357
|
Residual | 127.111111 16 7.94444444
---------------------+----------------------------------------------------
Total | 9924.83333 53 187.261006
Between-subjects error term: subject|noise
Levels: 6 (4 df)
Lowest b.s.e. variable: subject
Covariance pooled over: noise (for repeated variables)
Repeated variable: period
Huynh-Feldt epsilon = 1.0668
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.6476
Box's conservative epsilon = 0.5000
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
---------------------+----------------------------------------------------
period | 2 63.39 0.0000 0.0000 0.0003 0.0013
noise#period | 2 5.67 0.0293 0.0293 0.0569 0.0759
period#subject|noise | 8
--------------------------------------------------------------------------
Repeated variable: dial
Huynh-Feldt epsilon = 2.0788
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.9171
Box's conservative epsilon = 0.5000
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
---------------------+----------------------------------------------------
dial | 2 89.82 0.0000 0.0000 0.0000 0.0007
noise#dial | 2 1.91 0.2102 0.2102 0.2152 0.2394
dial#subject|noise | 8
--------------------------------------------------------------------------
Repeated variables: period#dial
Huynh-Feldt epsilon = 1.3258
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.5134
Box's conservative epsilon = 0.2500
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
---------------------+----------------------------------------------------
period#dial | 4 0.34 0.8499 0.8499 0.7295 0.5934
noise#period#dial | 4 0.36 0.8357 0.8357 0.7156 0.5825
Residual | 16
--------------------------------------------------------------------------
Again we see that in addition to the main ANOVA table we obtain an adjusted
table for each repeated variable (and their interaction). This result gives
the epsilon adjustments to the p-values for those terms in the model
involving the repeated measures variable(s).
A complicated design with two repeated variables
This example is an expanded version of the last example in the single
repeated-variable section of this document (a
complicated design with one repeated variable). The original data and
example were taken from table 9–11 of Myers (1966). I added another
repeated-measures variable, D, with three levels (thus expanding the
data by a factor of three). I created a fake res variable to replace
the one provided in table 9–11 of Myers (1966). The new model is much
larger than the original since D is interacted with all of the other
terms in the model.
Here is part of the data:
. list, sep(12)
+------------------------------+
| A G B S C D res |
|------------------------------|
1. | 1 1 1 1 1 1 22 |
2. | 1 1 1 1 1 2 23 |
3. | 1 1 1 1 1 3 29 |
4. | 1 1 1 1 2 1 28 |
5. | 1 1 1 1 2 2 30 |
6. | 1 1 1 1 2 3 34 |
7. | 1 1 1 1 3 1 41 |
8. | 1 1 1 1 3 2 42 |
9. | 1 1 1 1 3 3 45 |
10. | 1 1 1 2 1 1 15 |
11. | 1 1 1 2 1 2 19 |
12. | 1 1 1 2 1 3 15 |
|------------------------------|
13. | 1 1 1 2 2 1 31 |
14. | 1 1 1 2 2 2 31 |
15. | 1 1 1 2 2 3 30 |
...
|------------------------------|
133. | 2 4 2 15 3 1 67 |
134. | 2 4 2 15 3 2 67 |
135. | 2 4 2 15 3 3 71 |
136. | 2 4 2 16 1 1 48 |
137. | 2 4 2 16 1 2 51 |
138. | 2 4 2 16 1 3 48 |
139. | 2 4 2 16 2 1 56 |
140. | 2 4 2 16 2 2 61 |
141. | 2 4 2 16 2 3 60 |
142. | 2 4 2 16 3 1 76 |
143. | 2 4 2 16 3 2 75 |
144. | 2 4 2 16 3 3 78 |
+------------------------------+
Following the lead of Myers (1966), I want to create an ANOVA table with the
following information:
Model Term | F-Test
-------------------+---------------------------------
Between S |
Between G |
A | MS(A) / MS(G|A)
G|A |
Within G |
B | MS(B) / MS(B#G|A)
B#A | MS(B#A) / MS(B#G|A)
B#G|A | MS(B#G|A) / MS(S|B#G|A)
S|B#G|A |
Within S |
-------------+---------------------------------
C | MS(C) / MS(C#G|A)
C#A | MS(C#A) / MS(C#G|A)
C#G|A | MS(C#G|A) / MS(C#B#G|A)
C#B | MS(C#B) / MS(C#B#G|A)
C#B#A | MS(C#B#A) / MS(C#B#G|A)
C#B#G|A | MS(C#B#G|A) / MS(C#S|B#G|A)
C#S|B#G|A |
-------------+---------------------------------
D | MS(D) / MS(D#G|A)
D#A | MS(D#A) / MS(D#G|A)
D#G|A | MS(D#G|A) / MS(D#B#G|A)
D#B | MS(D#B) / MS(D#B#G|A)
D#B#A | MS(D#B#A) / MS(D#B#G|A)
D#B#G|A | MS(D#B#G|A) / MS(D#S|B#G|A)
D#S|B#G|A |
-------------+---------------------------------
D#C | MS(D#C) / MS(D#C#G|A)
D#C#A | MS(D#C#A) / MS(D#C#G|A)
D#C#G|A | MS(D#C#G|A) / MS(D#C#B#G|A)
D#C#B | MS(D#C#B) / MS(D#C#B#G|A)
D#C#B#A | MS(D#C#B#A) / MS(D#C#B#G|A)
D#C#B#G|A | MS(D#C#B#G|A) / MS(D#C#S|B#G|A)
D#C#S|B#G|A |
-------------------+---------------------------------
By writing the anova model in natural order (see above) and using the
“/” notation, I can get all but three of the tests
outlined above with one call to anova. The other three tests (on
C#G|A, D#B|A, and D#C#G|A) can be obtained using the
test command.
As more terms are added to the model, the
matsize must be
set higher to accommodate the larger model. Here I had to set the
matsize to 2322. Also realize that with large designs it may take a
while to run. Depending on the speed of your computer, you will probably
see Stata pausing for a while then printing out a few lines of output
and then pausing again. This is normal behavior.
Here is the anova run:
. set matsize 2322
Current memory allocation
current memory usage
settable value description (1M = 1024k)
--------------------------------------------------------------------
set maxvar 5000 max. variables allowed 1.909M
set memory 50M max. data space 50.000M
set matsize 2322 max. RHS vars in models 41.330M
-----------
93.239M
. anova res A / G|A B B#A / B#G|A / S|B#G|A C C#A / C#G|A C#B C#B#A / C#B#G|A / C#S|B#G|A D
> D#A / D#G|A D#B D#B#A / D#B#G|A / D#S|B#G|A D#C D#C#A / D#C#G|A D#C#B D#C#B#A / D#C#B#G|A
> / , repeated(C D)
Number of obs = 144 R-squared = 0.9966
Root MSE = 2.40875 Adj R-squared = 0.9848
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 54466.9722 111 490.693443 84.57 0.0000
|
A | 10201 1 10201 23.46 0.0401
G|A | 869.805556 2 434.902778
-----------+----------------------------------------------------
B | 3948.02778 1 3948.02778 6.30 0.1288
B#A | 5184 1 5184 8.27 0.1026
B#G|A | 1253.80556 2 626.902778
-----------+----------------------------------------------------
B#G|A | 1253.80556 2 626.902778 17.95 0.0011
S|B#G|A | 279.333333 8 34.9166667
-----------+----------------------------------------------------
C | 25644.4306 2 12822.2153 36.24 0.0027
C#A | 75.875 2 37.9375 0.11 0.9008
C#G|A | 1415.19444 4 353.798611
-----------+----------------------------------------------------
C#B | 574.013889 2 287.006944 1.99 0.2515
C#B#A | 98.2916667 2 49.1458333 0.34 0.7303
C#B#G|A | 577.527778 4 144.381944
-----------+----------------------------------------------------
C#B#G|A | 577.527778 4 144.381944 0.57 0.6872
C#S|B#G|A | 4042 16 252.625
-----------+----------------------------------------------------
D | 110.722222 2 55.3611111 11.01 0.0236
D#A | 1.5 2 .75 0.15 0.8660
D#G|A | 20.1111111 4 5.02777778
-----------+----------------------------------------------------
D#B | 1.72222222 2 .861111111 0.08 0.9268
D#B#A | 24.5 2 12.25 1.10 0.4156
D#B#G|A | 44.4444444 4 11.1111111
-----------+----------------------------------------------------
D#B#G|A | 44.4444444 4 11.1111111 3.78 0.0238
D#S|B#G|A | 47 16 2.9375
-----------+----------------------------------------------------
D#C | 2.36111111 4 .590277778 0.25 0.8997
D#C#A | 8.5 4 2.125 0.91 0.5012
D#C#G|A | 18.6388889 8 2.32986111
-----------+----------------------------------------------------
D#C#B | 2.11111111 4 .527777778 0.42 0.7881
D#C#B#A | 12.0833333 4 3.02083333 2.42 0.1334
D#C#B#G|A | 9.97222222 8 1.24652778
-----------+----------------------------------------------------
D#C#B#G|A | 9.97222222 8 1.24652778 0.21 0.9859
|
Residual | 185.666667 32 5.80208333
-----------+----------------------------------------------------
Total | 54652.6389 143 382.186286
Between-subjects error term: S|B#G|A
Levels: 16 (8 df)
Lowest b.s.e. variable: S
Covariance pooled over: B#G|A (for repeated variables)
Repeated variable: C
Huynh-Feldt epsilon = 2.4621
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.9891
Box's conservative epsilon = 0.5000
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
C | 2 36.24 0.0027 0.0027 0.0029 0.0265
C#A | 2 0.11 0.9008 0.9008 0.8991 0.7744
C#G|A | 4
-----------+----------------------------------------------------
C#B | 2 1.99 0.2515 0.2515 0.2524 0.2940
C#B#A | 2 0.34 0.7303 0.7303 0.7285 0.6186
C#B#G|A | 4
-----------+----------------------------------------------------
C#B#G|A | 4 0.57 0.6872 0.6872 0.6855 0.5861
C#S|B#G|A | 16
----------------------------------------------------------------
Repeated variable: D
Huynh-Feldt epsilon = 1.5569
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.7039
Box's conservative epsilon = 0.5000
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
D | 2 11.01 0.0236 0.0236 0.0481 0.0801
D#A | 2 0.15 0.8660 0.8660 0.8028 0.7365
D#G|A | 4
-----------+----------------------------------------------------
D#B | 2 0.08 0.9268 0.9268 0.8719 0.8069
D#B#A | 2 1.10 0.4156 0.4156 0.4107 0.4039
D#B#G|A | 4
-----------+----------------------------------------------------
D#B#G|A | 4 3.78 0.0238 0.0238 0.0446 0.0698
D#S|B#G|A | 16
----------------------------------------------------------------
Repeated variables: D#C
Huynh-Feldt epsilon = 1.5707
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon = 0.5864
Box's conservative epsilon = 0.2500
------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
D#C | 4 0.25 0.8997 0.8997 0.8155 0.6647
D#C#A | 4 0.91 0.5012 0.5012 0.4786 0.4404
D#C#G|A | 8
-----------+----------------------------------------------------
D#C#B | 4 0.42 0.7881 0.7881 0.7053 0.5820
D#C#B#A | 4 2.42 0.1334 0.1334 0.1891 0.2598
D#C#B#G|A | 8
-----------+----------------------------------------------------
D#C#B#G|A | 8 0.21 0.9859 0.9859 0.9454 0.8112
Residual | 32
----------------------------------------------------------------
. test C#G|A / C#B#G|A
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
C#G|A | 1415.19444 4 353.798611 2.45 0.2033
C#B#G|A | 577.527778 4 144.381944
. test D#G|A / D#B#G|A
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
D#G|A | 20.1111111 4 5.02777778 0.45 0.7693
D#B#G|A | 44.4444444 4 11.1111111
. test D#C#G|A / D#C#B#G|A
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
D#C#G|A | 18.6388889 8 2.32986111 1.87 0.1975
D#C#B#G|A | 9.97222222 8 1.24652778
With complicated designs, you might need a larger matrix than Stata allows.
If you get a “matsize too small” error, you can use the
dropemptycells option to eliminate empty cells from the design
matrix.
Stata will allow up to four repeated-measures variables in the
repeated() option and can handle even more complicated designs than
presented here. The most limiting thing you will find with complicated
designs is the maximum matrix size allowed by Stata.
Summary
I have presented seven examples involving one repeated-measurement variable.
These examples range from the simplest design to a complicated design. With
all of these examples, I discussed the use of both anova with the
repeated() option and wsanova (Gleason 1999).
For simple designs involving only one repeated-measures variable, the
wsanova command syntax might be most natural, depending on how you
think about ANOVA models. With more complicated designs, I advise that you
first understand the underlying model you are trying to estimate and then
use the anova command to get what you need.
I presented three examples involving two repeated-measures variables (Stata
allows up to four repeated-measures variables). These examples also ranged
from simple to complex. With these examples I demonstrated only the
anova command because the wsanova command is not designed to
handle multiple repeated measures.
In the course of showing these examples, I also outlined the errors
users sometimes make and the solutions to those errors. Here is a summary
of common mistakes and solutions:
- You have your data in wide format instead of long format and cannot
figure out how to call the anova command to perform a
repeated-measures ANOVA. The answer is to change your data to long
format (the first example shows the use of
reshape in solving this problem).
- You get the r(421) error message saying “could not determine
between-subject error term; use bse() option” when running a simple
design. With simple designs this error is often caused by forgetting to
include the subject (person, dog, item, ...) variable in the model. The
first two examples illustrate this kind of simple model. The solution is
to make sure to include the subject term in the model.
- With more complicated designs, a common error is omitting the
between-subjects error term in the model. Here you will receive
an error message. The solution is to make
sure you understand the underlying model and then include the
between-subjects error term in your call to anova. Several of the
examples illustrate this point.
- You throw your variables into the anova command and use the
repeated() option without thinking about the underlying design.
You either get an error message or actually get output, but you now do
not know how to interpret the results. Again, the solution is to first
understand the underlying model before trying to analyze your data.
- You have a complicated design and Stata gives you the r(146) error
with message “too many variables or values (matsize too small)”.
Here you need to increase the size of matrix allowed. See
help
matsize for details on setting the matrix size. Unfortunately, Stata
does have an upper limit on matsize. If your design is very large
and needs to create an ANOVA design matrix larger than the maximum
allowed, you can use the dropemptycells option of anova to
eliminate empty cells from the design matrix. If your matrix is still
too large, you will need to drop some of your high-order interactions.
Dropping interactions is the same as assuming that they are zero.
- You are running a large, complicated anova command (where you
needed to set the matsize much larger than usual) and Stata
appears to be frozen. With large designs there is a great deal of
computation going on behind the scenes. In these cases, Stata
will appear to pause at the beginning when you execute the command. It
will also pause occasionally after producing several lines of output.
This is natural for large ANOVA designs. Stata is busy trying to make
the needed calculations for your ANOVA. The speed of your computer will
determine how quickly the command is executed.
Many problems can be avoided by first understanding your underlying model.
As the design becomes more complicated, this understanding becomes more
crucial. Books that cover ANOVA in detail such as Winer, Brown, and Michels
(1991) can help you understand “fixed effects”, “random
effects”, “nesting”, “crossing”,
“expected mean squares”, and determining the appropriate error
terms to use in your ANOVA F tests.
References
- Cole, J. W. L., and J. E. Grizzle. 1966.
- Applications of multivariate analysis of variance to repeated measures
experiments. Biometrics 22: 810–828.
- Gleason, J. R. 1999.
- Within subjects (repeated measures) ANOVA, including between subjects
factors. Stata Technical Bulletin
47: 40-45. Reprinted in Stata Technical Bulletin Reprints, vol. 8, pp. 236–243.
- Myers, J. L. 1966.
- Fundamentals of Experimental Design. Boston: Allyn and Bacon.
- Winer, B. J., D. R. Brown, and K. M. Michels. 1991.
- Statistical Principles in Experimental
Design. 3rd Edition. New York: McGraw–Hill.
|