[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
khigbee@stata.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Extremely poor performance in repeated ANOVA |

Date |
Tue, 03 Feb 2004 12:53:00 -0600 |

Michael Ingre <Michael.Ingre@ipm.ki.se> asks: > I have tried fitting a repeated measures anova in Stata and I was > surprisingly disappointed with the performance. My dataset contains 17 > subjects observed 20 times a day during three different days. It is a simple > two-factor repeated measures ANOVA with a total of 1020 observations. > > . anova dv subject day / subject*day time / subject*time day*time > ,repeated(day time) > > I timed it this morning and Stata/SE (8.2) took 7 minutes 30 seconds to > complete the analyses!!!! > > My computer is not the fastest in the world (PowerBook G4, 800Mhz, 640MB > RAM) but SPSS run the same model in seconds!!! (SPSS report 2 seconds > processor time but there is some overhead). And my experience from similar > models in SPSS and StatView (StatView does not calculate epsilon) over the > last five of years or so, is that it should run in seconds rather than > minutes even if the model is considerably larger. I created a dataset based on the information you provided. I ran your -anova- on my 2.4 GHz computer running Linux. It finished in just under a minute. I do not know what SPSS and StatView are doing and so cannot fully explain the differences in timing. The traditional (standard) approach to ANOVA is called the "overparameterized ANOVA model". This is the approach used by the -anova- command in Stata. In this approach, the SSCP (sums-of-squares and cross-products) matrix is based on the full set of dummy (also called indicator) variables and their interactions based on the terms listed in the ANOVA. (We don't actually create the dummy variables, but the resulting SSCP matrix is the same as if we did.) For this particular ANOVA we have a 492 by 492 SSCP matrix. The 492 is based on the following breakdown of columns Term Columns d.f.s -------------------------------- The constant 1 subjects 17 16 day 3 2 subj*day (3*17) 51 32 time 20 19 subj*time (17*20) 340 304 day*time (3*20) 60 38 -------------------------------- Total 492 411 The number of degrees of freedom for the model is 411. Stata uses the matrix sweep operator on the resulting 492 by 492 matrix in order to solve the normal equations. During the sweep 81 of the columns are "swept" from the matrix (set to zero which indicates that they are dropped), leaving the 411 corresponding to the degrees of freedom for the model. When everything is balanced there may be faster ways of getting to the same answer. But, Stata's -anova-, using the sweep operator, is able to handle designs that are not balanced (including having missing cells) and that may have other collinearities (from continuous variables included in the model). In those cases, the faster ways of getting to the answer may not hold. Many years ago when I encountered SAS in school, (and I am guessing it is still true) they had a -PROC ANOVA- that required a balanced design. If you did not have a balanced design you needed to use -PROC GLM-; and as I understand it, their -PROC GLM- uses a sweep operation (similar to what Stata uses) to get at the answer. I would be surprised if the SAS PROC GLM and Stata -anova- speeds are drastically different. David Airey <david.airey@vanderbilt.edu> mentioned several alternatives for repeated measures data including Stata's -manova- command that was introduced in Stata 8. I personally like MANOVA over repeated measures ANOVA. (But there are some cases where the MANOVA cannot be done -- too many y variables compared to the number of observations -- where the repeated measures ANOVA can still be computed.) Ken Higbee khigbee@stata.com StataCorp 1-800-STATAPC * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Extremely poor performance in repeated ANOVA***From:*Michael Ingre <Michael.Ingre@ipm.ki.se>

- Prev by Date:
**RE: st: Problems Stochastic Frontier Analysis** - Next by Date:
**st: RE: Heckmanregression and Oaxaca dcomposition** - Previous by thread:
**re: st: Extremely poor performance in repeated ANOVA** - Next by thread:
**Re: st: Extremely poor performance in repeated ANOVA** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |