Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: re: RM ANOVA, was SPSS vs. Stata


From   "Ploutz-Snyder, Robert (JSC-SK)[USRA]" <robert.ploutz-snyder-1@nasa.gov>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: re: RM ANOVA, was SPSS vs. Stata
Date   Mon, 2 Aug 2010 12:29:26 -0500

" Doesn't SPSS wrap GLM for its RM-ANOVA routines?"

Yes--but with repeated measures designs, SPSS (and SAS, Systat, and BMDP in the old days) use listwise elimination.  Stata does not (is there an option in Stata's anova, repeated() code to do so??)




" Can you post an example of what you are talking about, re listwise elimination? I don't have SPSS."

Here's an example of  how Stata fails to ignore/eliminate  listwise for a fixed-factorial Repeated Measures ANOVA, compared to SPSS.  


IN STATA:
webuse t43
anova y year, repeated(year)

anova score person drug, repeated(drug)

                           Number of obs =      20     R-squared     =  0.9244
                           Root MSE      = 3.06594     Adj R-squared =  0.8803

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |        1379     7         197      20.96     0.0000
                         |
                  person |       680.8     4       170.2      18.11     0.0001
                    drug |       698.2     3  232.733333      24.76     0.0000
                         |
                Residual |       112.8    12         9.4   
              -----------+----------------------------------------------------
                   Total |      1491.8    19  78.5157895   


Between-subjects error term:  person
                     Levels:  5         (4 df)
     Lowest b.s.e. variable:  person

Repeated variable: drug
                                          Huynh-Feldt epsilon        =  1.0789
                                          *Huynh-Feldt epsilon reset to 1.0000
                                          Greenhouse-Geisser epsilon =  0.6049
                                          Box's conservative epsilon =  0.3333

                                            ------------ Prob > F ------------
                  Source |     df      F    Regular    H-F      G-G      Box
              -----------+----------------------------------------------------
                    drug |      3    24.76   0.0000   0.0000   0.0006   0.0076
                Residual |     12
              ----------------------------------------------------------------


IN SPSS:
			Tests of Within-Subjects Effects
Measure:MEASURE_1
Source		Type III Sum of Squares	df	Mean Square	F	Sig.
drug	Sphericity Assumed	698.200	3	232.733	24.759	.000
	Greenhouse-Geisser	698.200	1.815	384.763	24.759	.001
	Huynh-Feldt			698.200	3.000	232.733	24.759	.000
	Lower-bound			698.200	1.000	698.200	24.759	.008
Error(drug)	Sphericity Assume	112.800	12	9.400		
	Greenhouse-Geisser	112.800	7.258	15.540		
	Huynh-Feldt	112.800	12.000	9.400		
	Lower-bound	112.800	4.000	28.200		


So Stata and SPSS agree on the Repeated Measures F-statistic on Drug--because there is no missing data in this dataset.  However, if we eliminate an observation here and there for  a couple of subjects, SPSS and Stata fail to agree because
Stata does not eliminate or ignore cases listwise.  


For example IN STATA (using same dataset, but eliminating a couple of obs):

replace score = . in 1      /* eliminated person 1's score for drug 1 */
replace score = . in 10	    /* eliminated person 3's score for drug 2 */

anova score person drug, repeated(drug)

                           Number of obs =      18     R-squared     =  0.9414
                           Root MSE      =  2.9068     Adj R-squared =  0.9004

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  1357.28267     7  193.897525      22.95     0.0000
                         |
                  person |  653.704895     4  163.426224      19.34     0.0001
                    drug |  702.504895     3  234.168298      27.71     0.0000
                         |
                Residual |  84.4951049    10  8.44951049   
              -----------+----------------------------------------------------
                   Total |  1441.77778    17  84.8104575   


Between-subjects error term:  person
                     Levels:  5         (4 df)
     Lowest b.s.e. variable:  person

Repeated variable: drug
                                          Huynh-Feldt epsilon        =  0.5297
                                          Greenhouse-Geisser epsilon =  0.4228
                                          Box's conservative epsilon =  0.3333

                                            ------------ Prob > F ------------
                  Source |     df      F    Regular    H-F      G-G      Box
              -----------+----------------------------------------------------
                    drug |      3    27.71   0.0000   0.0019   0.0047   0.0102
                Residual |     10
              ----------------------------------------------------------------



NOTE that Stata is still using data from all subjects (levels = 5).


IN SPSS (same dataset): 


			Tests of Within-Subjects Effects
Source		Type III Sum of Squares	df	Mean Square	F	Sig.
drug	Sphericity Assumed	478.333	3	159.444	13.932	.004
	Greenhouse-Geisser	478.333	1.268	377.157	13.932	.044
	Huynh-Feldt			478.333	2.466	193.938	13.932	.008
	Lower-bound			478.333	1.000	478.333	13.932	.065
Error(drug)	Sphericity Assume	68.667	6	11.444		
	Greenhouse-Geisser	68.667	2.537	27.071		
	Huynh-Feldt			68.667	4.933	13.920		
	Lower-bound			68.667	2.000	34.333		




So in this admittedly simple example, SPSS revealed F(3,6) = 13.932, p~.004, whereas Stata shows F = 27.71, which is larger than the original analysis with no missing data.

Of course, with a sample size this tiny, we wouldn't trust either analysis.  The point is that the prevailing wisdom for fixed-factorial repeated measures ANOVA is to use listwise elimination, and Stata doesn't do this.  (And you get the same Stata results if you use the anova command without the repeated option but instead define the error terms manually--a process that is itself painful enough to avoid entirely if you have 2 or 3 factors, especially if more than 1 are repeated.) 


I appreciate that it is possible to "manually" tell Stata to ignore listwise those subjects who are missing any data... However this can get more complicated when there is more than 1 repeated measures factor (example, drugs a b c, measured pre and post).  And... exactly what is Stata's analysis "by default" anyway?  I could not write that up as a standard repeated measures ANOVA because it isn't that.  To me, a straightforward improvement to Stata's -anova- would be to force it to ignore any subjects who are missing any repeated measures observations.  That alone would be useful.  



Rob





-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Airey, David C
Sent: Monday, August 02, 2010 10:55 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: re: RM ANOVA, was SPSS vs. Stata

.

> What SPSS still maintains over Stata is better ANOVA routines,
> particularly Repeated-Measures fixed-factor designs.  Stata treats RM
> designs a bit strangely, I believe because it seems to "wrap" ANOVA code
> around Regression methods.  It's non-intuitive and can provide results
> that aren't typical of RM ANOVA (consider how it uses full-n for
> fixed-factor RM ANOVA without listwise elimination of subjects who are
> missing an observation).  I would much prefer to see Stata invest in
> re-working their ANOVA code and analyses so that it is more consistant
> with SAS or SPSS methodologies, offers more in terms of assumption
> testing (ex. Sphericity tests), and is more intuitive.

Michael Mitchell pointed this out in his head to head to head comparison of Stata, SPSS, and SAS some years ago in a report posted at ATS UCLA.

I don't know if this is true anymore with version 11.1 of xtmixed and the margins functionality. This book shows use of xtmixed in designed experiments:

<http://www-personal.umich.edu/~bwest/almmussp.html>

BTW, you can test sphericity in Stata directly with the mvtest command or by asking for the univariate rm-anova corrections when you use the "repeated(varlist)" option to anova.

Doesn't SPSS wrap GLM for its RM-ANOVA routines?

Can you post an example of what you are talking about, re listwise elimination? I don't have SPSS.



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index