Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# st: RE: re: RM ANOVA, was SPSS vs. Stata

 From "Ploutz-Snyder, Robert (JSC-SK)[USRA]" To "statalist@hsphsun2.harvard.edu" Subject st: RE: re: RM ANOVA, was SPSS vs. Stata Date Mon, 2 Aug 2010 12:29:26 -0500

```" Doesn't SPSS wrap GLM for its RM-ANOVA routines?"

Yes--but with repeated measures designs, SPSS (and SAS, Systat, and BMDP in the old days) use listwise elimination.  Stata does not (is there an option in Stata's anova, repeated() code to do so??)

" Can you post an example of what you are talking about, re listwise elimination? I don't have SPSS."

Here's an example of  how Stata fails to ignore/eliminate  listwise for a fixed-factorial Repeated Measures ANOVA, compared to SPSS.

IN STATA:
webuse t43
anova y year, repeated(year)

anova score person drug, repeated(drug)

Number of obs =      20     R-squared     =  0.9244
Root MSE      = 3.06594     Adj R-squared =  0.8803

Source |  Partial SS    df       MS           F     Prob > F
-----------+----------------------------------------------------
Model |        1379     7         197      20.96     0.0000
|
person |       680.8     4       170.2      18.11     0.0001
drug |       698.2     3  232.733333      24.76     0.0000
|
Residual |       112.8    12         9.4
-----------+----------------------------------------------------
Total |      1491.8    19  78.5157895

Between-subjects error term:  person
Levels:  5         (4 df)
Lowest b.s.e. variable:  person

Repeated variable: drug
Huynh-Feldt epsilon        =  1.0789
*Huynh-Feldt epsilon reset to 1.0000
Greenhouse-Geisser epsilon =  0.6049
Box's conservative epsilon =  0.3333

------------ Prob > F ------------
Source |     df      F    Regular    H-F      G-G      Box
-----------+----------------------------------------------------
drug |      3    24.76   0.0000   0.0000   0.0006   0.0076
Residual |     12
----------------------------------------------------------------

IN SPSS:
Tests of Within-Subjects Effects
Measure:MEASURE_1
Source		Type III Sum of Squares	df	Mean Square	F	Sig.
drug	Sphericity Assumed	698.200	3	232.733	24.759	.000
Greenhouse-Geisser	698.200	1.815	384.763	24.759	.001
Huynh-Feldt			698.200	3.000	232.733	24.759	.000
Lower-bound			698.200	1.000	698.200	24.759	.008
Error(drug)	Sphericity Assume	112.800	12	9.400
Greenhouse-Geisser	112.800	7.258	15.540
Huynh-Feldt	112.800	12.000	9.400
Lower-bound	112.800	4.000	28.200

So Stata and SPSS agree on the Repeated Measures F-statistic on Drug--because there is no missing data in this dataset.  However, if we eliminate an observation here and there for  a couple of subjects, SPSS and Stata fail to agree because
Stata does not eliminate or ignore cases listwise.

For example IN STATA (using same dataset, but eliminating a couple of obs):

replace score = . in 1      /* eliminated person 1's score for drug 1 */
replace score = . in 10	    /* eliminated person 3's score for drug 2 */

anova score person drug, repeated(drug)

Number of obs =      18     R-squared     =  0.9414
Root MSE      =  2.9068     Adj R-squared =  0.9004

Source |  Partial SS    df       MS           F     Prob > F
-----------+----------------------------------------------------
Model |  1357.28267     7  193.897525      22.95     0.0000
|
person |  653.704895     4  163.426224      19.34     0.0001
drug |  702.504895     3  234.168298      27.71     0.0000
|
Residual |  84.4951049    10  8.44951049
-----------+----------------------------------------------------
Total |  1441.77778    17  84.8104575

Between-subjects error term:  person
Levels:  5         (4 df)
Lowest b.s.e. variable:  person

Repeated variable: drug
Huynh-Feldt epsilon        =  0.5297
Greenhouse-Geisser epsilon =  0.4228
Box's conservative epsilon =  0.3333

------------ Prob > F ------------
Source |     df      F    Regular    H-F      G-G      Box
-----------+----------------------------------------------------
drug |      3    27.71   0.0000   0.0019   0.0047   0.0102
Residual |     10
----------------------------------------------------------------

NOTE that Stata is still using data from all subjects (levels = 5).

IN SPSS (same dataset):

Tests of Within-Subjects Effects
Source		Type III Sum of Squares	df	Mean Square	F	Sig.
drug	Sphericity Assumed	478.333	3	159.444	13.932	.004
Greenhouse-Geisser	478.333	1.268	377.157	13.932	.044
Huynh-Feldt			478.333	2.466	193.938	13.932	.008
Lower-bound			478.333	1.000	478.333	13.932	.065
Error(drug)	Sphericity Assume	68.667	6	11.444
Greenhouse-Geisser	68.667	2.537	27.071
Huynh-Feldt			68.667	4.933	13.920
Lower-bound			68.667	2.000	34.333

So in this admittedly simple example, SPSS revealed F(3,6) = 13.932, p~.004, whereas Stata shows F = 27.71, which is larger than the original analysis with no missing data.

Of course, with a sample size this tiny, we wouldn't trust either analysis.  The point is that the prevailing wisdom for fixed-factorial repeated measures ANOVA is to use listwise elimination, and Stata doesn't do this.  (And you get the same Stata results if you use the anova command without the repeated option but instead define the error terms manually--a process that is itself painful enough to avoid entirely if you have 2 or 3 factors, especially if more than 1 are repeated.)

I appreciate that it is possible to "manually" tell Stata to ignore listwise those subjects who are missing any data... However this can get more complicated when there is more than 1 repeated measures factor (example, drugs a b c, measured pre and post).  And... exactly what is Stata's analysis "by default" anyway?  I could not write that up as a standard repeated measures ANOVA because it isn't that.  To me, a straightforward improvement to Stata's -anova- would be to force it to ignore any subjects who are missing any repeated measures observations.  That alone would be useful.

Rob

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Airey, David C
Sent: Monday, August 02, 2010 10:55 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: re: RM ANOVA, was SPSS vs. Stata

.

> What SPSS still maintains over Stata is better ANOVA routines,
> particularly Repeated-Measures fixed-factor designs.  Stata treats RM
> designs a bit strangely, I believe because it seems to "wrap" ANOVA code
> around Regression methods.  It's non-intuitive and can provide results
> that aren't typical of RM ANOVA (consider how it uses full-n for
> fixed-factor RM ANOVA without listwise elimination of subjects who are
> missing an observation).  I would much prefer to see Stata invest in
> re-working their ANOVA code and analyses so that it is more consistant
> with SAS or SPSS methodologies, offers more in terms of assumption
> testing (ex. Sphericity tests), and is more intuitive.

Michael Mitchell pointed this out in his head to head to head comparison of Stata, SPSS, and SAS some years ago in a report posted at ATS UCLA.

I don't know if this is true anymore with version 11.1 of xtmixed and the margins functionality. This book shows use of xtmixed in designed experiments:

<http://www-personal.umich.edu/~bwest/almmussp.html>

BTW, you can test sphericity in Stata directly with the mvtest command or by asking for the univariate rm-anova corrections when you use the "repeated(varlist)" option to anova.

Doesn't SPSS wrap GLM for its RM-ANOVA routines?

Can you post an example of what you are talking about, re listwise elimination? I don't have SPSS.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index