Title  Repeatedmeasures ANOVA examples  
Authors  Kenneth Higbee, StataCorp Wesley Eddings, StataCorp 

Date  February 2000; updated April 2015 
Repeatedmeasures ANOVA, obtained with the repeated() option of the anova command, requires more structural information about your model than a regular ANOVA, as mentioned in the technical note on page 35 of [R] anova. When this information cannot be determined from the information provided in your anova command, you end up getting error messages such as
could not determine betweensubject error term; use bse() option r(421);
or
could not determine betweensubject basic unit; use bseunit() option r(422);
These error messages can almost always be avoided with the proper specification of your ANOVA model.
You can jump ahead to the summary to see a list of common user errors and how to overcome them. The examples presented here demonstrate how to obtain a repeatedmeasures ANOVA and show ways to overcome common errors.
The command wsanova, written by John Gleason and presented in article sg103 of STB47 (Gleason 1999), provides a different syntax for specifying certain types of repeatedmeasures ANOVA designs. Not all repeatedmeasures ANOVA designs are supported by wsanova, but for some problems you might find the syntax more intuitive. (See below for installation instructions.) In other cases, using Stata’s anova command with the repeated() option may be the more natural, or the only, way to obtain the analysis you seek.
The anova manual entry (see the Repeatedmeasures ANOVA section in [R] anova) presents three repeatedmeasures ANOVA examples. The examples range from a simple dataset having five persons with measures on four drugs taken from table 4.3 of Winer, Brown, and Michels (1991), to the more complicated data from table 7.13 of Winer, Brown, and Michels (1991) involving two repeatedmeasures variables (and their interactions) along with a betweensubjects term.
Gleason (1999) demonstrates the wsanova command with data from Cole and Grizzle (1966). With these data he provides three examples that illustrate a repeatedmeasures ANOVA with none, one, and two betweensubjects factors.
Here I demonstrate the anova and wsanova commands to specify various types of repeatedmeasures ANOVAs. I repeat the examples from the anova manual entry and the wsanova STB article (Gleason 1999). A couple of other examples are also presented. Seven examples involving one repeated variable and three examples involving two repeated variables are shown. Along the way I comment on the common types of user mistakes made in specifying these kinds of models and show how to overcome the difficulty.
The following examples illustrate various ways repeatedmeasures ANOVA models with one repeated measure variable may be specified in Stata. I start with the simplest repeated measures design and progress through more complicated designs. I demonstrate how to use both the anova command and the wsanova command (when possible) and discuss potential problems and possible solutions.
The example starting on page 32 of [R] anova is taken from table 4.3 of Winer, Brown, and Michels (1991). Using tabdisp we can get a tabular view of the data.
. use http://www.statapress.com/data/r14/t43 (T4.3  Winer, Brown, Michels) . tabdisp person drug, cellvar(score)
drug  
person  1 2 3 4  
1  30 28 16 34  
2  14 18 10 22  
3  24 20 18 30  
4  38 34 20 44  
5  26 28 14 30  
The data are in long format.
. list, sepby(person)
person drug score  
1.  1 1 30  
2.  1 2 28  
3.  1 3 16  
4.  1 4 34  
5.  2 1 14  
6.  2 2 18  
7.  2 3 10  
8.  2 4 22  
9.  3 1 24  
10.  3 2 20  
11.  3 3 18  
12.  3 4 30  
13.  4 1 38  
14.  4 2 34  
15.  4 3 20  
16.  4 4 44  
17.  5 1 26  
18.  5 2 28  
19.  5 3 14  
20.  5 4 30  
An error users make is to try to execute the anova (or wsanova) command with the data in wide format. For instance, if my data looked like this
. list
person drug1 drug2 drug3 drug4  
1.  1 30 28 16 34  
2.  2 14 18 10 22  
3.  3 24 20 18 30  
4.  4 38 34 20 44  
5.  5 26 28 14 30  
I would not be able to run the appropriate anova command. The data can be changed to the long format needed by anova by using the reshape command.
. reshape long drug, i(person) j(dr) (note: j = 1 2 3 4) Data wide > long
Number of obs. 5 > 20 
Number of variables 5 > 3 
j variable (4 values) > dr 
xij variables: 
drug1 drug2 ... drug4 > drug 
I would have to rename the drug variable score and then rename the dr variable drug to have the same variable names shown in my earlier listing of the original longformat dataset.
The repeatedmeasures anova for this example is
Source  Partial SS df MS F Prob > F  
Model  1379 7 197 20.96 0.0000  
person  680.8 4 170.2 18.11 0.0001  
drug  698.2 3 232.733333 24.76 0.0000  
Residual  112.8 12 9.4  
Total  1491.8 19 78.5157895 
Prob > F 
Source  df F Regular HF GG Box  
drug  3 24.76 0.0000 0.0000 0.0006 0.0076  
Residual  12  
An explanation of the output is included in the manual.
A common error that might be made when trying to run anova on this simple example is to enter
. anova score drug, repeated(drug) could not determine betweensubject error term; use bse() option r(421);You might be tempted, after seeing the above error message, to type
. anova score drug, repeated(drug) bse(person) term not in model r(147);
but this approach also fails. The moral of this last error message is that to perform the necessary computations for a repeatedmeasures ANOVA, the betweensubjects error term must be a term in the ANOVA model. Here we need to have person as one of the terms in the model. This leads to the correct specification anova score person drug, repeated(drug) as shown earlier.
The wsanova command presented in STB47 sg103 (Gleason 1999) can also perform this analysis. To obtain this command, type net STB47 followed by net describe sg103, and then follow the installation instructions. See help stb for details.
. wsanova score drug, id(person) epsilon Number of obs = 20 Rsquared = 0.9244 Root MSE = 3.06594 Adj Rsquared = 0.8803
Source  Partial SS df MS F Prob > F  
person  680.8 4 170.2  
drug  698.2 3 232.733333 24.76 0.0000  
Residual  112.8 12 9.4  
Total  1491.8 19 78.5157895 
Source  df F Prob > F Prob > F Prob > F  
drug  3 24.76 0.0000 0.0006 0.0000 
We get the same information we did with the anova command. Which command to use for this simple case is a matter of personal preference. You can either use
anova score person drug, repeated(drug)
or download wsanova and use
wsanova score drug, id(person) epsilon
The examples in Gleason (1999) demonstrating the wsanova command use a dataset obtained from Cole and Grizzle (1966). With the net command (also see help stb), you can obtain the dataset, histamin.dta, as well as the wsanova command. Type net STB47 followed by net describe sg103, then follow the instructions.
Gleason’s first example, a “single factor within subject (randomized blocks) design” is the same underlying ANOVA design as presented in the previous example. Since this example is similar to the previous one, I simply show how you can obtain the analysis using the anova and wsanova commands without additional comments. The analysis using anova proceeds just as it did with our previous example. This time, we have lhist measurements on dogs over time. Unlike our first example, we restrict the analysis to the first group of dogs with the if group==1 command qualifier.
. use histamin, clear (Blood histamine levels in dogs) . anova lhist dog time if group==1, repeated(time) Number of obs = 16 Rsquared = 0.9388 Root MSE = .409681 Adj Rsquared = 0.8979
Source  Partial SS df MS F Prob > F  
Model  23.159216 6 3.8598693 23.00 0.0001  
dog  16.902408 3 5.634136 33.57 0.0000  
time  6.2568079 3 2.0856026 12.43 0.0015  
Residual  1.5105466 9 .16783851  
Total  24.669763 15 1.6446508 
Prob > F 
Source  df F Regular HF GG Box  
time  3 12.43 0.0015 0.0138 0.0267 0.0388  
Residual  9  
The same results are also easily obtained with the wsanova command.
. wsanova lhist time if group==1, id(dog) epsilon Number of obs = 16 Rsquared = 0.9388 Root MSE = .409681 Adj Rsquared = 0.8979
Source  Partial SS df MS F Prob > F  
dog  16.9024081 3 5.63413604  
time  6.25680792 3 2.08560264 12.43 0.0015  
Residual  1.51054662 9 .167838513  
Total  24.6697627 15 1.64465084 
Source  df F Prob > F Prob > F Prob > F  
time  3 12.43 0.0015 0.0267 0.0138 
You may use
anova lhist dog time if group==1, repeated(time)
or download wsanova and use
wsanova lhist time if group==1, id(dog) epsilon
Both commands provide the same information.
The example starting on page 34 of [R] anova is taken from table 7.7 of Winer, Brown, and Michels (1991). By using tabdisp we can get a tabular view of the data.
. use http://www.statapress.com/data/r14/t77, clear (T7.7  Winer, Brown, Michels) . tabdisp shape subject calib, cell(score)
2 methods for calibrating dials and  
subject nested in calib  
4 dial  1  2  
shapes  1 2 3 1 2 3  
1  0 3 4 4 5 7  
2  0 1 3 2 4 5  
3  5 5 6 7 6 8  
4  3 4 2 8 6 9  
I have the data in long form.
. list, sepby(subject)
calib subject shape score  
1.  1 1 1 0  
2.  1 1 2 0  
3.  1 1 3 5  
4.  1 1 4 3  
5.  1 2 1 3  
6.  1 2 2 1  
7.  1 2 3 5  
8.  1 2 4 4  
9.  1 3 1 4  
10.  1 3 2 3  
11.  1 3 3 6  
12.  1 3 4 2  
13.  2 1 1 4  
14.  2 1 2 2  
15.  2 1 3 7  
16.  2 1 4 8  
17.  2 2 1 5  
18.  2 2 2 4  
19.  2 2 3 6  
20.  2 2 4 6  
21.  2 3 1 7  
22.  2 3 2 5  
23.  2 3 3 8  
24.  2 3 4 9  
If instead you had the data in a wide format, you would need to use the reshape command to get it into long format before using the anova (or wsanova) command. For an example of using reshape, see the first example.
You should understand your model before attempting to use anova. For this dataset, both calib and shape are fixed while subject is random. The full model includes terms for calib, subject nested within calib, shape, shape interacted with calib, and shape interacted with subject nested within calib. As usual, we let this highest order term drop and become the residual error. The shape variable is the repeated variable. This produces an ANOVA with one betweensubjects factor (same underlying design as the next example). If you were to examine the expected mean squares for this setup (Winer, Brown, and Michels 1991), you would find the appropriate error term for the test of calib is subjectcalib. The appropriate error term for shape and shape#calib is shape#subjectcalib (which is the residual error since we do not include the term in the model).
Armed with this information, it becomes easy to specify the correct anova command.
. anova score calib / subjectcalib shape calib#shape, repeated(shape) Number of obs = 24 Rsquared = 0.8925 Root MSE = 1.11181 Adj Rsquared = 0.7939
Source  Partial SS df MS F Prob > F  
Model  123.125 11 11.193182 9.06 0.0003  
calib  51.041667 1 51.041667 11.89 0.0261  
subjectcalib  17.166667 4 4.2916667  
shape  47.458333 3 15.819444 12.80 0.0005  
calib#shape  7.4583333 3 2.4861111 2.01 0.1662  
Residual  14.833333 12 1.2361111  
Total  137.95833 23 5.9981884 
Prob > F 
Source  df F Regular HF GG Box  
shape  3 12.80 0.0005 0.0011 0.0099 0.0232  
calib#shape  3 2.01 0.1662 0.1791 0.2152 0.2291  
Residual  12  
A common error when unfamiliar with the underlying model is to just list some variables in the anova command (possibly with some interactions included), and then get the following error message.
. anova score calib subject shape calib#shape, repeated(shape) could not determine betweensubject error term; use bse() option r(421);
Stata’s anova command needs the betweensubject error term (here subjectcalib) to be included in the model to obtain the repeatedmeasures corrections.
The wsanova command (Gleason 1999) seems like a natural alternative to use for this example. It seems you should be able to say
. wsanova score shape, id(subject) between(calib) epsilon epsilon option is invalid with missing data r(499);
but something went wrong. This dataset has no missing observations. This
is just wsanova’s way of saying it is confused. What
could have caused the confusion? Look at the listing of the data near the
beginning of this example. In particular, pay attention to how the
subject variable is set up. We have subjects going from 1 to
3 for the first level of calib and then going from 1 to 3 again for
the second level of calib. anova was able to handle this, but
wsanova is confused. We can help wsanova out of its confusion
by generating a new variable that gives a unique number to each subject
regardless of which level of calib is involved. We use the
. egen z = group(calib subject) . wsanova score shape, id(z) between(calib) epsilon Number of obs = 24 Rsquared = 0.8925 Root MSE = 1.11181 Adj Rsquared = 0.7939
Source  Partial SS df MS F Prob > F  
Between subjects:  51.0416667 1 51.0416667 11.89 0.0261  
calib  51.0416667 1 51.0416667 11.89 0.0261  
z*calib  17.1666667 4 4.29166667  
Within subjects:  54.9166667 6 9.15277778 7.40 0.0017  
shape  47.4583333 3 15.8194444 12.80 0.0005  
shape*calib  7.45833333 3 2.48611111 2.01 0.1662  
Residual  14.8333333 12 1.23611111  
Total  137.958333 23 5.99818841 
Source  df F Prob > F Prob > F Prob > F  
shape  3 12.80 0.0005 0.0099 0.0011  
shape*calib  3 2.01 0.1662 0.2152 0.1791 
We have been able to reproduce the same results we obtained with anova. There is one test provided in the output of wsanova above that is not automatically produced with anova. If you look back at the ANOVA table produced by wsanova, you will see it produces an overall test for “Within subjects”. Here it produces an F of 7.40.
Within subjects:  54.9166667 6 9.15277778 7.40 0.0017
Using the test command we can easily obtain this same test after running anova.
. test shape calib#shape
Source  Partial SS df MS F Prob > F  
shape calib#shape  54.916667 6 9.1527778 7.40 0.0017  
Residual  14.833333 12 1.236111 
With this example you can either do
anova score calib / subjectcalib shape calib#shape , repeated(shape)
or download wsanova (see above for installation instructions) and do
egen z = group(calib subject) wsanova score shape, id(z) between(calib) epsilon
Both provide the same information.
The examples in Gleason (1999) demonstrating the wsanova command use a dataset obtained from Cole and Grizzle (1966). With the net command (also see help stb) you can obtain the dataset, histamin.dta, as well as the wsanova command (type net STB47 followed by net describe sg103, and then follow the instructions). Gleason’s second example, a one betweensubjects factor ANOVA design, is the same underlying ANOVA design presented in the previous example.
Since this example is similar to the previous one, I simply show how you can obtain the analysis using the anova and wsanova commands without additional comments. The analysis using anova proceeds just as it did with our previous example. This time we have lhist measurements on dogs nested within groups over time. Following the lead of Gleason (1999) we restrict the data with the if dog != 6 command qualifier.
. use histamin, clear (Blood histamine levels in dogs) . anova lhist group / doggroup time time#group if dog!=6, repeated(time) Number of obs = 60 Rsquared = 0.9709 Root MSE = .27427 Adj Rsquared = 0.9479
Source  Partial SS df MS F Prob > F  
Model  82.683638 26 3.1801399 42.28 0.0000  
group  27.028627 3 9.0095423 4.07 0.0359  
doggroup  24.346834 11 2.2133486  
time  12.058987 3 4.0196624 53.44 0.0000  
time#group  17.523292 9 1.9470324 25.88 0.0000  
Residual  2.4823889 33 .07522391  
Total  85.166027 59 1.443492 
Prob > F 
Source  df F Regular HF GG Box  
time  3 53.44 0.0000 0.0000 0.0000 0.0000  
time#group  9 25.88 0.0000 0.0000 0.0000 0.0000  
Residual  33  
I can obtain the overall withinsubjects test as follows:
. test time time#group
Source  Partial SS df MS F Prob > F  
time time#group  31.308177 12 2.6090148 34.68 0.0000  
Residual  2.4823889 33 .07522391 
This same analysis is also easy with wsanova:
. wsanova lhist time if dog!=6, id(dog) between(group) epsilon Number of obs = 60 Rsquared = 0.9709 Root MSE = .27427 Adj Rsquared = 0.9479
Source  Partial SS df MS F Prob > F  
Between subjects:  27.0286268 3 9.00954226 4.07 0.0359  
group  27.0286268 3 9.00954226 4.07 0.0359  
dog*group  24.3468341 11 2.21334855  
Within subjects:  31.3081774 12 2.60901478 34.68 0.0000  
time  12.0589871 3 4.01966235 53.44 0.0000  
time*group  17.5232918 9 1.94703243 25.88 0.0000  
Residual  2.48238892 33 .075223907  
Total  85.1660271 59 1.44349199 
Source  df F Prob > F Prob > F Prob > F  
time  3 53.44 0.0000 0.0000 0.0000  
time*group  9 25.88 0.0000 0.0000 0.0000 
This example has the dogs numbered from 1 to 16, so (unlike the previous example) there is no need to generate a new id() variable for the wsanova command.
For this example, you can pick between running
anova lhist group / doggroup time time#group if dog != 6, repeated(time)
and downloading wsanova and running
wsanova lhist time if dog != 6, id(dog) between(group) epsilon
to obtain the results.
The third example in Gleason (1999) demonstrating the wsanova command also uses the histamin.dta dataset obtained from Cole and Grizzle (1966). This example expands from the previous example by splitting the group variable, which has four levels, into two variables, depleted and drug, each with two levels corresponding to a 2 × 2 factorial. We end up having two betweensubject factors plus their interaction. Again, following the lead of Gleason (1999), we restrict the data with the if dog != 6 command qualifier.
Here is the result of running wsanova on this dataset:
. use histamin, clear (Blood histamine levels in dogs) . wsanova lhist time if dog!=6, id(dog) between(drug depl drug*depl) eps Number of obs = 60 Rsquared = 0.9709 Root MSE = .27427 Adj Rsquared = 0.9479
Source  Partial SS df MS F Prob > F  
Between subjects:  27.0286268 3 9.00954226 4.07 0.0359  
drug  5.99336256 1 5.99336256 2.71 0.1281  
depleted  15.4484076 1 15.4484076 6.98 0.0229  
drug*depleted  4.69087549 1 4.69087549 2.12 0.1734  
dog*drug*depleted  24.3468341 11 2.21334855  
Within subjects:  31.3081774 12 2.60901478 34.68 0.0000  
time  12.0589871 3 4.01966235 53.44 0.0000  
time*drug  1.84429539 3 .614765129 8.17 0.0003  
time*depleted  12.0897855 3 4.02992849 53.57 0.0000  
time*drug*depleted  2.93077944 3 .976926479 12.99 0.0000  
Residual  2.48238892 33 .075223907  
Total  85.1660271 59 1.44349199 
Source  df F Prob > F Prob > F Prob > F  
time  3 53.44 0.0000 0.0000 0.0000  
time*drug  3 8.17 0.0003 0.0039 0.0008  
time*depleted  3 53.57 0.0000 0.0000 0.0000  
time*drug*depleted  3 12.99 0.0000 0.0005 0.0000 
The anova command with the repeated() option can also be used on this problem:
. anova lhist drug dep drug#dep / dogdrug#dep time time#drug time#dep time#drug#dep > if dog!=6, rep(time) Number of obs = 60 Rsquared = 0.9709 Root MSE = .27427 Adj Rsquared = 0.9479
Source  Partial SS df MS F Prob > F  
Model  82.683638 26 3.1801399 42.28 0.0000  
drug  6.1513201 1 6.1513201 2.78 0.1237  
depleted  15.712679 1 15.712679 7.10 0.0220  
drug#depleted  4.6908755 1 4.6908755 2.12 0.1734  
dogdrug#depleted  24.346834 11 2.2133486  
time  12.058987 3 4.0196624 53.44 0.0000  
time#drug  1.8442954 3 .61476513 8.17 0.0003  
time#depleted  12.089785 3 4.0299285 53.57 0.0000  
time#drug#depleted  2.9307794 3 .97692648 12.99 0.0000  
Residual  2.4823889 33 .07522391  
Total  85.166027 59 1.443492 
Prob > F 
Source  df F Regular HF GG Box  
time  3 53.44 0.0000 0.0000 0.0000 0.0000  
time#drug  3 8.17 0.0003 0.0008 0.0039 0.0156  
time#depleted  3 53.57 0.0000 0.0000 0.0000 0.0000  
time#drug#depleted  3 12.99 0.0000 0.0000 0.0005 0.0041  
Residual  33  
If you look closely, you will find a difference in the results for the drug and the depleted terms between anova and wsanova. This is due to the imbalance in the data from excluding the observations associated with the sixth dog.
. tabulate drug depleted if dog!=6
Drug  Depleted pretest  
administer  histamines?  
ed  No Yes  Total  
Morphine  16 12  28  
TriMeth  16 16  32  
Total  32 28  60 
The wsanova command actually performs its work with two separate calls to anova instead of getting the whole ANOVA table at one time. The anova command with the repeated() option computes the complete model in one estimation. In the presence of imbalanced data, this method can sometimes make a difference in the results. In these cases, I recommend using the anova command.
Gleason (1999) also shows for this example how to use the wonly() option in conjunction with the between() option of wsanova to control which terms end up in the ANOVA table.
. wsanova lhist time if dog!=6, id(dog) between(drug depl) wonly(time time*depl) epsilon Number of obs = 60 Rsquared = 0.9103 Root MSE = .442692 Adj Rsquared = 0.8642
Source  Partial SS df MS F Prob > F  
Between subjects:  22.3377513 2 11.1688756 4.62 0.0326  
drug  6.87754936 1 6.87754936 2.84 0.1176  
depleted  16.8857304 1 16.8857304 6.98 0.0215  
dog*drug*depleted  29.0377096 12 2.41980913  
Within subjects:  26.1474934 6 4.35791556 22.24 0.0000  
time  12.0454347 3 4.0151449 20.49 0.0000  
time*depleted  12.3626079 3 4.12086929 21.03 0.0000  
Residual  7.64307289 39 .195976228  
Total  85.1660271 59 1.44349199 
Source  df F Prob > F Prob > F Prob > F  
time  3 20.49 0.0000 0.0000 0.0000  
time*depleted  3 21.03 0.0000 0.0000 0.0000 
We can, of course, obtain the same results directly with anova.
. anova lhist drug depl / dogdrug#depl time time#depl if dog!=6, rep(time) Number of obs = 60 Rsquared = 0.9103 Root MSE = .442692 Adj Rsquared = 0.8642
Source  Partial SS df MS F Prob > F  
Model  77.522954 20 3.8761477 19.78 0.0000  
drug  6.8775494 1 6.8775494 2.84 0.1176  
depleted  16.88573 1 16.88573 6.98 0.0215  
dogdrug#depleted  29.03771 12 2.4198091  
time  12.045435 3 4.0151449 20.49 0.0000  
time#depleted  12.362608 3 4.1208693 21.03 0.0000  
Residual  7.6430729 39 .19597623  
Total  85.166027 59 1.443492 
Prob > F 
Source  df F Regular HF GG Box  
time  3 20.49 0.0000 0.0000 0.0000 0.0006  
time#depleted  3 21.03 0.0000 0.0000 0.0000 0.0005  
Residual  39  
As with the previous examples, it is important to understand your model and to make sure to include the betweensubjects error term in the model. Here it is the term dogdrug#depleted. The wsanova command puts this term (labeled as dog*drug*depleted) into the model automatically based on the options you specify.
This example does point out that for models with imbalance there can sometimes be a difference between wsanova and anova in the reported ANOVA table for some of the terms. In these cases, you should rely on the anova command.
This example is taken from the data of table 7.22 of Winer, Brown, and Michels (1991) and has a similar underlying structure to that of the previous example.
For this example, we have an experiment on a learning task with the variables anxiety and tension, each at two levels in a factorial layout. Nested within this interaction is subject. These are the variables involved in the betweensubjects portion of our ANOVA. There are four trials—our repeated variable. We are also interested in examining the interaction of trial with the other terms in the model.
Here is a tabular view of the data:
. use t722, clear (T7.22  Winer, Brown, Michels) . tabdisp subject trial, by(anxiety tension) c(response) concise stubw(10)
effect of  
anxiety   
2 levels,  
muscular  
tension   
2 levels  
and  trial  
subject  1 2 3 4  
1  
1  
1  18 14 12 6  
2  19 12 8 4  
3  14 10 6 2  
1  
2  
4  16 12 10 4  
5  12 8 6 2  
6  18 10 5 1  
2  
1  
7  16 10 8 4  
8  18 8 4 1  
9  16 12 6 2  
2  
2  
10  19 16 10 8  
11  16 14 10 9  
12  16 12 8 8  
In the following anova command, I take advantage of Stata’s ability to allow abbreviations for the variable names.
. anova response an te an#te / suan#te tr an#tr te#tr an#te#tr, rep(tr) Number of obs = 48 Rsquared = 0.9585 Root MSE = 1.47432 Adj Rsquared = 0.9188
Source  Partial SS df MS F Prob > F  
Model  1205.833 23 52.427536 24.12 0.0000  
anxiety  10.083333 1 10.083333 0.98 0.3517  
tension  8.3333333 1 8.3333333 0.81 0.3949  
anxiety#tension  80.083333 1 80.083333 7.77 0.0237  
subjectanxiety#tension  82.5 8 10.3125  
trial  991.5 3 330.5 152.05 0.0000  
anxiety#trial  8.4166667 3 2.8055556 1.29 0.3003  
tension#trial  12.166667 3 4.0555556 1.87 0.1624  
anxiety#tension#trial  12.75 3 4.25 1.96 0.1477  
Residual  52.166667 24 2.1736111  
Total  1258 47 26.765957 
Prob > F 
Source  df F Regular HF GG Box  
trial  3 152.05 0.0000 0.0000 0.0000 0.0000  
anxiety#trial  3 1.29 0.3003 0.3015 0.3002 0.2888  
tension#trial  3 1.87 0.1624 0.1693 0.1967 0.2091  
anxiety#tension#trial  3 1.96 0.1477 0.1550 0.1847 0.1996  
Residual  24  
The wsanova command (Gleason 1999) can also be used for this example.
. wsanova response trial, id(subject) between(anx tens anx*tens) epsilon Number of obs = 48 Rsquared = 0.9585 Root MSE = 1.47432 Adj Rsquared = 0.9188
Source  Partial SS df MS F Prob > F  
Between subjects:  98.5 3 32.8333333 3.18 0.0845  
anxiety  10.0833333 1 10.0833333 0.98 0.3517  
tension  8.33333333 1 8.33333333 0.81 0.3949  
anxiety*tension  80.0833333 1 80.0833333 7.77 0.0237  
subject*anxiety*tension  82.5 8 10.3125  
Within subjects:  1024.83333 12 85.4027778 39.29 0.0000  
trial  991.5 3 330.5 152.05 0.0000  
trial*anxiety  8.41666667 3 2.80555556 1.29 0.3003  
trial*tension  12.1666667 3 4.05555556 1.87 0.1624  
trial*anxiety*tension  12.75 3 4.25 1.96 0.1477  
Residual  52.1666667 24 2.17361111  
Total  1258 47 26.7659574 
Source  df F Prob > F Prob > F Prob > F  
trial  3 152.05 0.0000 0.0000 0.0000  
trial*anxiety  3 1.29 0.3003 0.3002 0.3015  
trial*tension  3 1.87 0.1624 0.1967 0.1693  
trial*anxiety*tension  3 1.96 0.1477 0.1847 0.1550 
You can choose between
anova response an te an#te / suan#te tr an#tr te#tr an#te#tr , rep(tr)
or download wsanova (see above for installation instructions) and then type
wsanova response trial, id(subject) between(anx tens anx*tens) epsilon
Table 9–11 of Myers (1966) presents an interesting dataset with factor A having two levels, G (representing groups) nested within A (a total of four groups), factor B with two levels that is crossed with A and GA, S (representing subjects) nested with all of this (SB#GA) for a total of 16 subjects, then factor C, the repeatedmeasures variable with three levels. Each of the 16 subjects has measures for the three levels of C. The interaction of C with the other terms is also included in the model.
Here is a look at the data:
. use tm911, clear . list, sepby(G)
A G B S C res  
1.  1 1 1 1 1 4  
2.  1 1 1 1 2 5  
3.  1 1 1 1 3 8  
4.  1 1 1 2 1 3  
5.  1 1 1 2 2 6  
6.  1 1 1 2 3 10  
7.  1 1 2 3 1 3  
8.  1 1 2 3 2 6  
9.  1 1 2 3 3 10  
10.  1 1 2 4 1 4  
11.  1 1 2 4 2 5  
12.  1 1 2 4 3 9  
13.  1 2 1 5 1 4  
14.  1 2 1 5 2 7  
15.  1 2 1 5 3 8  
16.  1 2 1 6 1 3  
17.  1 2 1 6 2 6  
18.  1 2 1 6 3 9  
19.  1 2 2 7 1 1  
20.  1 2 2 7 2 6  
21.  1 2 2 7 3 8  
22.  1 2 2 8 1 4  
23.  1 2 2 8 2 2  
24.  1 2 2 8 3 12  
25.  2 3 1 9 1 7  
26.  2 3 1 9 2 7  
27.  2 3 1 9 3 11  
28.  2 3 1 10 1 4  
29.  2 3 1 10 2 8  
30.  2 3 1 10 3 14  
31.  2 3 2 11 1 9  
32.  2 3 2 11 2 8  
33.  2 3 2 11 3 16  
34.  2 3 2 12 1 7  
35.  2 3 2 12 2 10  
36.  2 3 2 12 3 19  
37.  2 4 1 13 1 3  
38.  2 4 1 13 2 5  
39.  2 4 1 13 3 9  
40.  2 4 1 14 1 2  
41.  2 4 1 14 2 7  
42.  2 4 1 14 3 8  
43.  2 4 2 15 1 10  
44.  2 4 2 15 2 12  
45.  2 4 2 15 3 13  
46.  2 4 2 16 1 9  
47.  2 4 2 16 2 11  
48.  2 4 2 16 3 15  
Myers (1966) indicates that for this example the ANOVA table should have the following structure:
Model Term  FTest  
Between S  
Between G  
A  MS(A) / MS(GA)  
GA  
Within G  
B  MS(B) / MS(B#GA)  
B#A  MS(B#A) / MS(B#GA)  
B#GA  MS(B#GA) / MS(SB#GA)  
SB#GA  
Within S  
C  MS(C) / MS(C#GA)  
C#A  MS(C#A) / MS(C#GA)  
C#GA  MS(C#GA) / MS(C#B#GA)  
C#B  MS(C#B) / MS(C#B#GA)  
C#B#A  MS(C#B#A) / MS(C#B#GA)  
C#B#GA  MS(C#B#GA) / MS(C#SB#GA)  
C#SB#GA  
How did Myers (1966) determine the appropriate mean square to use in the denominator of each of the F tests listed above? He first determined which factors were fixed and which were random and which factors were nested and which were crossed. Then, from that, he figured the expected mean squares for each term. From these he could see which terms were the appropriate error terms for other terms in the model. See Winer, Brown, and Michels (1991) or some other good book on ANOVA modeling to understand “fixed factors”, “random factors”, “nesting”, “crossing”, “expected mean squares”, etc.
The anova command allows the “/” notation that indicates the terms to the left of the slash are to be tested using the term to the right of the slash as the error term. This method makes it easy to get all but one of the F tests from the complicated ANOVA table above with one call to anova. The remaining F test (the test for the C#GA term) is easily obtained with a call to the test command after running anova. Again, I drop the largest possible interaction term (C#SB#GA) so that the residual (which would have had zero degrees of freedom if the term were left in the model) becomes that interaction term.
. anova res A / GA B B#A / B#GA / SB#GA C C#A / C#GA C#B C#B#A / C#B#GA / , rep(C) Number of obs = 48 Rsquared = 0.9346 Root MSE = 1.70171 Adj Rsquared = 0.8080
Source  Partial SS df MS F Prob > F  
Model  662.64583 31 21.375672 7.38 0.0001  
A  136.6875 1 136.6875 24.76 0.0381  
GA  11.041667 2 5.5208333  
B  54.1875 1 54.1875 7.45 0.1121  
B#A  67.6875 1 67.6875 9.31 0.0927  
B#GA  14.541667 2 7.2708333  
B#GA  14.541667 2 7.2708333 13.96 0.0025  
SB#GA  4.1666667 8 .52083333  
C  337.16667 2 168.58333 34.88 0.0029  
C#A  1.5 2 .75 0.16 0.8612  
C#GA  19.333333 4 4.8333333  
C#B  8 2 4 2.04 0.2448  
C#B#A  .5 2 .25 0.13 0.8836  
C#B#GA  7.8333333 4 1.9583333  
C#B#GA  7.8333333 4 1.9583333 0.68 0.6182  
Residual  46.333333 16 2.8958333  
Total  708.97917 47 15.084663 
Prob > F 
Source  df F Regular HF GG Box  
C  2 34.88 0.0029 0.0029 0.0030 0.0275  
C#A  2 0.16 0.8612 0.8612 0.8605 0.7317  
C#GA  4  
C#B  2 2.04 0.2448 0.2448 0.2451 0.2892  
C#B#A  2 0.13 0.8836 0.8836 0.8830 0.7551  
C#B#GA  4  
C#B#GA  4 0.68 0.6182 0.6182 0.6177 0.5354  
Residual  16  
Source  Partial SS df MS F Prob > F  
C#GA  19.333333 4 4.8333333 2.47 0.2015  
C#B#GA  7.8333333 4 1.9583333 
The wsanova command (Gleason 1999) can produce the appropriate mean squares for the terms in the model but will not be able to automatically create the correct F tests for most of the terms. It does not understand all of the structure of this complicated model. Here is what you can obtain from wsanova:
. wsanova res C, id(S) between(A G*A B B*A B*G*A) epsilon Number of obs = 48 Rsquared = 0.9346 Root MSE = 1.70171 Adj Rsquared = 0.8080
Source  Partial SS df MS F Prob > F  
Between subjects:  284.145833 7 40.5922619 77.94 0.0000  
A  136.6875 1 136.6875 262.44 0.0000  
G*A  11.0416667 2 5.52083333 10.60 0.0056  
B  54.1875 1 54.1875 104.04 0.0000  
B*A  67.6875 1 67.6875 129.96 0.0000  
B*G*A  14.5416667 2 7.27083333 13.96 0.0025  
S*A*G*B  4.16666667 8 .520833333  
Within subjects:  374.333333 16 23.3958333 8.08 0.0001  
C  337.166667 2 168.583333 58.22 0.0000  
C*A  1.5 2 .75 0.26 0.7750  
C*G*A  19.3333333 4 4.83333333 1.67 0.2061  
C*B  8 2 4 1.38 0.2797  
C*B*A  .5 2 .25 0.09 0.9177  
C*B*G*A  7.83333333 4 1.95833333 0.68 0.6182  
Residual  46.3333333 16 2.89583333  
Total  708.979167 47 15.0846631 
Source  df F Prob > F Prob > F Prob > F  
C  2 58.22 0.0000 0.0000 0.0000  
C*A  2 0.26 0.7750 0.7742 0.7750  
C*G*A  4 1.67 0.2061 0.2064 0.2061  
C*B  2 1.38 0.2797 0.2797 0.2797  
C*B*A  2 0.09 0.9177 0.9171 0.9177  
C*B*G*A  4 0.68 0.6182 0.6177 0.6182 
Remember that for this complicated ANOVA you should ignore most of the F tests produced in the output from the wsanova command. Instead, you need to produce the correct F tests from the meansquares in the ANOVA table after running wsanova. Using the anova command and taking advantage of the “/” notation gives you the appropriate F tests directly in the ANOVA table.
If you did not understand the underlying model for this example and just tried entering variable names into the anova command hoping something good would come out, you would most likely be disappointed. While understanding the underlying model is helpful with simple problems, it becomes crucial with more complicated designs.
Shown below are three examples of repeatedmeasures ANOVAs where the subjects have repeated observations over more than one variable. Unlike the previous section of this document where I outlined the use of both anova and wsanova (Gleason 1999), with more than one repeatedmeasures variable, the anova command is the only choice.
This example is obtained by restricting our attention of the data from the next example to only one level of the betweensubjects variable. This choice produces an example with no betweensubjects factors and two repeated variables. The data come from table 7.13 of Winer, Brown, and Michels (1991). After keeping only those observations of interest to this example, we have three subjects, each with nine accuracy scores on all combinations of the three different dials and three different periods. With subject a random factor and both dial and period fixed factors, the appropriate error term for the test of dial is the dial#subject interaction. Likewise, period#subject is the correct error term for period, and period#dial#subject (which we will drop so that it becomes residual error) is the appropriate error term for period#dial.
Here are the data:
. use http://www.statapress.com/data/r14/t713, clear (T7.13  Winer, Brown, Michels) . keep if noise==1 (27 observations deleted) . drop noise . label var subject "" . tabdisp subject dial period, cell(score)
10 minute time periods and type of dial  
1  2  3  
subject  1 2 3 1 2 3 1 2 3  
1  45 53 60 40 52 57 28 37 46  
2  35 41 50 30 37 47 25 32 41  
3  60 65 75 58 54 70 40 47 50  
By specifying both the period and dial variables in the repeated() option of anova along with appropriate use of the “/” notation for specifying the proper error terms in the model, we can easily obtain the desired ANOVA table.
. anova score subject period / subject#period dial / subject#dial period#dial, > repeated(period dial) Number of obs = 27 Rsquared = 0.9871 Root MSE = 2.60342 Adj Rsquared = 0.9580
Source  Partial SS df MS F Prob > F  
Model  4146.4444 18 230.35802 33.99 0.0000  
subject  1828.2222 2 914.11111 29.54 0.0040  
period  1124.6667 2 562.33333 18.17 0.0098  
subject#period  123.77778 4 30.944444  
dial  1020.6667 2 510.33333 51.32 0.0014  
subject#dial  39.777778 4 9.9444444  
period#dial  9.3333333 4 2.3333333 0.34 0.8410  
Residual  54.222222 8 6.7777778  
Total  4200.666 26 161.5641 
Prob > F 
Source  df F Regular HF GG Box  
period  2 18.17 0.0098 0.0275 0.0441 0.0509  
subject#period  4  
Prob > F 
Source  df F Regular HF GG Box  
dial  2 51.32 0.0014 0.0062 0.0147 0.0189  
subject#dial  4  
Prob > F 
Source  df F Regular HF GG Box  
period#dial  4 0.34 0.8410 0.6246 0.6187 0.6168  
Residual  8  
The test on subject in the main ANOVA table should be ignored.
With multiple repeated variables we obtain the various epsilon corrections (Greenhouse–Geisser, Huynh–Feldt, Box’s conservative epsilon) to the pvalues for each repeated variable and each interaction of those repeated variables.
This example can be found starting on page 36 of [R] anova. The data are from table 7.13 of Winer, Brown, and Michels (1991). There is one betweensubject factor, noise, with two levels. There are three subjects nested within each level of noise. As with the previous example, there are two repeated variables, period and dial, each with three levels, so that each subject has nine values recorded. Details of this dataset and the underlying model can be found in [R] anova and in Winer, Brown, and Michels (1991).
Here are the data:
. use http://www.statapress.com/data/r14/t713, clear (T7.13  Winer, Brown, Michels) . tabdisp subject dial period, by(noise) cell(score) stubwidth(11)
noise  
background  
and subject  10 minute time periods and type of dial  
nested in  1  2  3  
noise  1 2 3 1 2 3 1 2 3  
1  
1  45 53 60 40 52 57 28 37 46  
2  35 41 50 30 37 47 25 32 41  
3  60 65 75 58 54 70 40 47 50  
2  
1  50 48 61 25 34 51 16 23 35  
2  42 45 55 30 37 43 22 27 37  
3  56 60 77 40 39 57 31 29 46  
Here are the ANOVA results for these data:
. anova score noise / subjectnoise period noise#period / period#subjectnoise dial > noise#dial / dial#subjectnoise period#dial noise#period#dial, repeated(period dial) Number of obs = 54 Rsquared = 0.9872 Root MSE = 2.81859 Adj Rsquared = 0.9576
Source  Partial SS df MS F Prob > F  
Model  9797.7222 37 264.8033 33.33 0.0000  
noise  468.16667 1 468.16667 0.75 0.4348  
subjectnoise  2491.1111 4 622.77778  
period  3722.3333 2 1861.1667 63.39 0.0000  
noise#period  333 2 166.5 5.67 0.0293  
period#subjectnoise  234.88889 8 29.361111  
dial  2370.3333 2 1185.1667 89.82 0.0000  
noise#dial  50.333333 2 25.166667 1.91 0.2102  
dial#subjectnoise  105.55556 8 13.194444  
period#dial  10.666667 4 2.6666667 0.34 0.8499  
noise#period#dial  11.333333 4 2.8333333 0.36 0.8357  
Residual  127.11111 16 7.9444444  
Total  9924.8333 53 187.26101 
Prob > F 
Source  df F Regular HF GG Box  
period  2 63.39 0.0000 0.0000 0.0003 0.0013  
noise#period  2 5.67 0.0293 0.0293 0.0569 0.0759  
period#subjectnoise  8  
Prob > F 
Source  df F Regular HF GG Box  
dial  2 89.82 0.0000 0.0000 0.0000 0.0007  
noise#dial  2 1.91 0.2102 0.2102 0.2152 0.2394  
dial#subjectnoise  8  
Prob > F 
Source  df F Regular HF GG Box  
period#dial  4 0.34 0.8499 0.8499 0.7295 0.5934  
noise#period#dial  4 0.36 0.8357 0.8357 0.7156 0.5825  
Residual  16  
Again we see that in addition to the main ANOVA table we obtain an adjusted table for each repeated variable (and their interaction). This result gives the epsilon adjustments to the pvalues for those terms in the model involving the repeated measures variable(s).
This example is an expanded version of the last example in the single repeatedvariable section of this document (a complicated design with one repeated variable). The original data and example were taken from table 9–11 of Myers (1966). I added another repeatedmeasures variable, D, with three levels (thus expanding the data by a factor of three). I created a fake res variable to replace the one provided in table 9–11 of Myers (1966). The new model is much larger than the original since D is interacted with all of the other terms in the model.
Here is part of the data:
. list, sep(12)
A G B S C D res  
1.  1 1 1 1 1 1 22  
2.  1 1 1 1 1 2 23  
3.  1 1 1 1 1 3 29  
4.  1 1 1 1 2 1 28  
5.  1 1 1 1 2 2 30  
6.  1 1 1 1 2 3 34  
7.  1 1 1 1 3 1 41  
8.  1 1 1 1 3 2 42  
9.  1 1 1 1 3 3 45  
10.  1 1 1 2 1 1 15  
11.  1 1 1 2 1 2 19  
12.  1 1 1 2 1 3 15  
13.  1 1 1 2 2 1 31  
14.  1 1 1 2 2 2 31  
15.  1 1 1 2 2 3 30  
...  
133.  2 4 2 15 3 1 67  
134.  2 4 2 15 3 2 67  
135.  2 4 2 15 3 3 71  
136.  2 4 2 16 1 1 48  
137.  2 4 2 16 1 2 51  
138.  2 4 2 16 1 3 48  
139.  2 4 2 16 2 1 56  
140.  2 4 2 16 2 2 61  
141.  2 4 2 16 2 3 60  
142.  2 4 2 16 3 1 76  
143.  2 4 2 16 3 2 75  
144.  2 4 2 16 3 3 78  
Following the lead of Myers (1966), I want to create an ANOVA table with the following information:
Model Term  FTest  
Between S  
Between G  
A  MS(A) / MS(GA)  
GA  
Within G  
B  MS(B) / MS(B#GA)  
B#A  MS(B#A) / MS(B#GA)  
B#GA  MS(B#GA) / MS(SB#GA)  
SB#GA  
Within S  
C  MS(C) / MS(C#GA)  
C#A  MS(C#A) / MS(C#GA)  
C#GA  MS(C#GA) / MS(C#B#GA)  
C#B  MS(C#B) / MS(C#B#GA)  
C#B#A  MS(C#B#A) / MS(C#B#GA)  
C#B#GA  MS(C#B#GA) / MS(C#SB#GA)  
C#SB#GA  
D  MS(D) / MS(D#GA)  
D#A  MS(D#A) / MS(D#GA)  
D#GA  MS(D#GA) / MS(D#B#GA)  
D#B  MS(D#B) / MS(D#B#GA)  
D#B#A  MS(D#B#A) / MS(D#B#GA)  
D#B#GA  MS(D#B#GA) / MS(D#SB#GA)  
D#SB#GA  
D#C  MS(D#C) / MS(D#C#GA)  
D#C#A  MS(D#C#A) / MS(D#C#GA)  
D#C#GA  MS(D#C#GA) / MS(D#C#B#GA)  
D#C#B  MS(D#C#B) / MS(D#C#B#GA)  
D#C#B#A  MS(D#C#B#A) / MS(D#C#B#GA)  
D#C#B#GA  MS(D#C#B#GA) / MS(D#C#SB#GA)  
D#C#SB#GA  
By writing the anova model in natural order (see above) and using the “/” notation, I can get all but three of the tests outlined above with one call to anova. The other three tests (on C#GA, D#BA, and D#C#GA) can be obtained using the test command.
As more terms are added to the model, the matsize must be set higher to accommodate the larger model. Here I had to set the matsize to 2322. Also realize that with large designs it may take a while to run. Depending on the speed of your computer, you will probably see Stata pausing for a while then printing out a few lines of output and then pausing again. This is normal behavior.
Here is the anova run:
. set matsize 2322 Current memory allocation current memory usage settable value description (1M = 1024k)
set maxvar 5000 max. variables allowed  1.909M 
set memory 50M max. data space  50.000M 
set matsize 2322 max. RHS vars in models  41.330M 
93.239M 
Source  Partial SS df MS F Prob > F  
Model  54466.9722 111 490.693443 84.57 0.0000  
A  10201 1 10201 23.46 0.0401  
GA  869.805556 2 434.902778  
B  3948.02778 1 3948.02778 6.30 0.1288  
B#A  5184 1 5184 8.27 0.1026  
B#GA  1253.80556 2 626.902778  
B#GA  1253.80556 2 626.902778 17.95 0.0011  
SB#GA  279.333333 8 34.9166667  
C  25644.4306 2 12822.2153 36.24 0.0027  
C#A  75.875 2 37.9375 0.11 0.9008  
C#GA  1415.19444 4 353.798611  
C#B  574.013889 2 287.006944 1.99 0.2515  
C#B#A  98.2916667 2 49.1458333 0.34 0.7303  
C#B#GA  577.527778 4 144.381944  
C#B#GA  577.527778 4 144.381944 0.57 0.6872  
C#SB#GA  4042 16 252.625  
D  110.722222 2 55.3611111 11.01 0.0236  
D#A  1.5 2 .75 0.15 0.8660  
D#GA  20.1111111 4 5.02777778  
D#B  1.72222222 2 .861111111 0.08 0.9268  
D#B#A  24.5 2 12.25 1.10 0.4156  
D#B#GA  44.4444444 4 11.1111111  
D#B#GA  44.4444444 4 11.1111111 3.78 0.0238  
D#SB#GA  47 16 2.9375  
D#C  2.36111111 4 .590277778 0.25 0.8997  
D#C#A  8.5 4 2.125 0.91 0.5012  
D#C#GA  18.6388889 8 2.32986111  
D#C#B  2.11111111 4 .527777778 0.42 0.7881  
D#C#B#A  12.0833333 4 3.02083333 2.42 0.1334  
D#C#B#GA  9.97222222 8 1.24652778  
D#C#B#GA  9.97222222 8 1.24652778 0.21 0.9859  
Residual  185.666667 32 5.80208333  
Total  54652.6389 143 382.186286 
Prob > F 
Source  df F Regular HF GG Box  
C  2 36.24 0.0027 0.0027 0.0029 0.0265  
C#A  2 0.11 0.9008 0.9008 0.8991 0.7744  
C#GA  4  
C#B  2 1.99 0.2515 0.2515 0.2524 0.2940  
C#B#A  2 0.34 0.7303 0.7303 0.7285 0.6186  
C#B#GA  4  
C#B#GA  4 0.57 0.6872 0.6872 0.6855 0.5861  
C#SB#GA  16  
Prob > F 
Source  df F Regular HF GG Box  
D  2 11.01 0.0236 0.0236 0.0481 0.0801  
D#A  2 0.15 0.8660 0.8660 0.8028 0.7365  
D#GA  4  
D#B  2 0.08 0.9268 0.9268 0.8719 0.8069  
D#B#A  2 1.10 0.4156 0.4156 0.4107 0.4039  
D#B#GA  4  
D#B#GA  4 3.78 0.0238 0.0238 0.0446 0.0698  
D#SB#GA  16  
Prob > F 
Source  df F Regular HF GG Box  
D#C  4 0.25 0.8997 0.8997 0.8155 0.6647  
D#C#A  4 0.91 0.5012 0.5012 0.4786 0.4404  
D#C#GA  8  
D#C#B  4 0.42 0.7881 0.7881 0.7053 0.5820  
D#C#B#A  4 2.42 0.1334 0.1334 0.1891 0.2598  
D#C#B#GA  8  
D#C#B#GA  8 0.21 0.9859 0.9859 0.9454 0.8112  
Residual  32  
Source  Partial SS df MS F Prob > F  
C#GA  1415.19444 4 353.798611 2.45 0.2033  
C#B#GA  577.527778 4 144.381944 
Source  Partial SS df MS F Prob > F  
D#GA  20.1111111 4 5.02777778 0.45 0.7693  
D#B#GA  44.4444444 4 11.1111111 
Source  Partial SS df MS F Prob > F  
D#C#GA  18.6388889 8 2.32986111 1.87 0.1975  
D#C#B#GA  9.97222222 8 1.24652778 
With complicated designs, you might need a larger matrix than Stata allows. If you get a “matsize too small” error, you can use the dropemptycells option to eliminate empty cells from the design matrix.
Stata will allow up to four repeatedmeasures variables in the repeated() option and can handle even more complicated designs than presented here. The most limiting thing you will find with complicated designs is the maximum matrix size allowed by Stata.
I have presented seven examples involving one repeatedmeasurement variable. These examples range from the simplest design to a complicated design. With all of these examples, I discussed the use of both anova with the repeated() option and wsanova (Gleason 1999).
For simple designs involving only one repeatedmeasures variable, the wsanova command syntax might be most natural, depending on how you think about ANOVA models. With more complicated designs, I advise that you first understand the underlying model you are trying to estimate and then use the anova command to get what you need.
I presented three examples involving two repeatedmeasures variables (Stata allows up to four repeatedmeasures variables). These examples also ranged from simple to complex. With these examples I demonstrated only the anova command because the wsanova command is not designed to handle multiple repeated measures.
In the course of showing these examples, I also outlined the errors users sometimes make and the solutions to those errors. Here is a summary of common mistakes and solutions:
Many problems can be avoided by first understanding your underlying model. As the design becomes more complicated, this understanding becomes more crucial. Books that cover ANOVA in detail such as Winer, Brown, and Michels (1991) can help you understand “fixed effects”, “random effects”, “nesting”, “crossing”, “expected mean squares”, and determining the appropriate error terms to use in your ANOVA F tests.