Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: repeated measures ANOVA with missing observations

Subject   Re: st: repeated measures ANOVA with missing observations
Date   Mon, 02 Dec 2002 12:34:25 -0600

David Ronis <> asks:

> My experience with other software is that repeated measures ANOVA will
> either drop cases with any missing data or fail to run when there are
> missing data.  Before getting into more complex procedures like SAS PROC
> MIXED I thought I'd give it a try in Stata.  My expectation was that the
> failure would help motivate me for the work ahead.
> I studied Kenneth Higbee's FAQ a bit at
> <cut>
> To my surprise, the following code ran and gave results that seemed
> reasonable (given my eyeballing the data and means):
> clear
> capture log close
> log using preeval2.log, replace
> * Approach from Higbee FAQ
> *
> set matsize 800
> set memory 4m
> set more off
> use e:\yeo2\vo2-pree-val.dta
> anova vo2 id time / time*id stage / stage*id machine / machine*id  /*
>  */  time*stage / time*stage*id   /*
>  */  time*machine  / time*machine*id   /*
>  */  stage*machine / stage*machine*id  /*
>  */  time*stage*machine /       ,  /*
>  */  repeated (time stage machine)
> I'm wondering whether this is really an appropriate analysis, and what
> assumptions it / I may be making (especially unusual ones)?  For sig test
> results I'm looking at the adjusted ones, not those in the initial ANOVA
> table.  It has been about 20 years since I studied ANOVA.

In the interest of brevity I will point you (and others interested
in the subject) back to some statalist threads of long ago.

At the end of July 2001 a similar question was asked.  You can
go to

to read the message.  It quotes from

    Milliken & Johnson, 1984, "Analysis of Messy Data, Volume 1:
    Designed Experiments", Van Nostrand Reinhold Company, New York.
    ISBN: 0-534-02713-7

and has some further discussion.  It also points to a statalist
discussion that happened in mid October of 2000.  I would point
you to a web link, but the Yahoo site keeps only a certain size
buffer of old messages (currently you can only go back to Nov. of
2000).  The archives at Harvard

appear to only go back to January 2001.  And the archives at

appear to go from 1994 through July 1998.

So, since you might have a hard time finding the discussion from
Oct. 2000, here is what I wrote on 12 Oct 2000 at the conclusion
of the discussion.

Subject: Re: Unbalanced Repeated Measures ANOVA

Al Feiveson <> provides a good
caution regarding an example I gave with a significant amount of
missing cells in a repeated measures ANOVA.

> Ken - I see that Stata will produce the ANOVA as you have
> indicated - but how good are the "F" statistics? If I am not
> mistaken, they won't really have an exact "F"-distribution even
> if the error terms are independent normal and homoscedastic. Of
> course, nothing is really normally distributed, etc, anyway, so
> this is probably a moot issue.

With complicated ANOVA designs having missing cells the "proper"
F-tests can be difficult (sometimes impossible) to construct.

In simpler designs where residual error is the only error term,
the missing cells in the design do not change the use of MSerror
for the denominator of the F test.

In more complicated designs where there are different error terms
for various levels of the model, the expected mean squares in the
presence of missing cells can lead to very complicated tests.  A
discussion of this can be found in

    Milliken and Johnson, 1984, "Analysis of Messy Data, Volume 1:
       Designed Experiments", New York: Van Nostrand Reinhold

In particular, around page 395 it shows an example of how you
would form a test, and, as Al Feiveson alludes to, even the test
they contrive does not truely follow an F distribution.  They say
concerning the test that it

    "... does not have an exact F-distribution since (1) the
    statistic in the denominator does not have a distribution
    that is proportional to an exact chi-square distribution, and
    (2) the numerator and denominator may not be independently
    distributed. ..."

So if you have missing cells in a complicated ANOVA design you
will need to exercise caution in interpretting F-tests using non
residual error terms.

In practice, I believe many people close their eyes tight and
proceed with the tests as if the missing cells were not present.
With only a small percentage of the cells missing this may be
reasonable.  With a larger percentage of cells missing it may not
be reasonable.

Ken Higbee

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index