[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
David Airey <david.airey@vanderbilt.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
re: Re: st: Extremely poor performance in repeated ANOVA |

Date |
Wed, 4 Feb 2004 10:14:11 -0600 |

Michael Ingre replied:

Ken Higbee <khigbee@stata.com>:On my computer, a 1.25 GHz Powerbook, the timing for this problem with Michael Ingre's data set was:

> I created a dataset based on the information you provided. I ran

> your -anova- on my 2.4 GHz computer running Linux. It finished

> in just under a minute. I do not know what SPSS and StatView are

> doing and so cannot fully explain the differences in timing.

I need to correct my timing a bit. My PowerBook (apparently) did not feel

very well yesterday. I have run it three times this morning in 3 minutes

29-32 seconds on an iMac G4 800Mhz. That's still however, a 100 times slower

than SPSS.

r; t=119.92 9:02:08

Most of this was due to the epsilon correction calculations. The uncorrected ANOVA table was completed in less than 30 seconds (probably ~ 20 s).

Data Desk 6.2 calculated the ANOVA table (using GLM) less than 3 seconds:

Design:

Source F/R max df EMS F-Denom

Const - 1 sbt+Const sbt

sbt R 16 sbt Error

day F 2 sbt*day+day sbt*day

sbt*day M 32 sbt*day Error

tim F 19 sbt*tim+tim sbt*tim

sbt*tim M 304 sbt*tim Error

day*tim F 38 day*tim Error

Error R 608

Total 1019

ANOVA:

Source df SS MS F P

Const 1 17708.3 17708.3 209.28 ² 0.0001

sbt 16 1353.83 84.6146 69.225 ² 0.0001

day 2 80.6608 40.3304 5.1077 0.0119

sbt*day 32 252.673 7.89602 6.4599 ² 0.0001

tim 19 113.157 5.95562 2.5548 0.0005

sbt*tim 304 708.676 2.33117 1.9072 ² 0.0001

day*tim 38 53.4961 1.40779 1.1517 0.2487

Error 608 743.171 1.22232

Total 1019 3305.67

There is not a requirement for the data to be balanced using Data Desk for univariate repeated measures ANOVA; a subject is not completely dropped because one repeated observation was missing. On the other hand, Data Desk offers no corrections. Data Desk can calculate repeated measures design using MANOVA, but only in a limited way, unlike Stata. Data Desk could not, for example, compute a Ingre's problem using MANOVA, according to the manual. Stata can.

No, the ANOVA did not finish. Or rather the epsilon corrections never finished. My conclusion was that I should use a different approach altogether, for two reasons. One is that the ANOVA I discussed online previously was actually a smaller test version of the one I really need to run. It turns out that the design matrix limits are too small in Stata SE. My inadequate understanding is that both Proc Mixed and R LME use alternative ways of representing matrices during internal calculations, and are able to compute problems of the size I am interested in. The second reason is that both Proc Mixed and LME allow different covariance structures to be modeled, which is more realistic for repeated measures situations.

> When everything is balanced there may be faster ways of getting

> to the same answer. But, Stata's -anova-, using the sweep

> operator, is able to handle designs that are not balanced

> (including having missing cells) and that may have other

> collinearities (from continuous variables included in the model).

> In those cases, the faster ways of getting to the answer may not

> hold.

Yes. That's it. Thank you Ken for making that point. SPSS and StatView only

accepts cases with complete data on all measurements. In this area Stata

outperforms the competition.

The ability to analyze unbalanced designs with missing cells is intriguing

and I can think of many situations where it could be useful. Though, special

care must be taken, when there are lot's of missing data or when the pattern

of missing data is systematic.

Given the enormous speed improvement with (presumably) the alternative way

of calculating ANOVAs, an alternative procedure for anova (for complete

cases data) is high up on my wish list. And I guess also on David Aireys

(did your anova finish at all?) and others who do experimental research.

Ouch. Are there no alternatives in any of the xt- commands in Stata? This is usually where I get frustrated with what I don't know--(in?)applicability of the xt commands to experimental repeated measures data typically analyzed by ANOVA/MANOVA or mixed modeling.

> David Airey <david.airey@vanderbilt.edu> mentioned several

> alternatives for repeated measures data including Stata's

> -manova- command that was introduced in Stata 8. I personally

> like MANOVA over repeated measures ANOVA. (But there are some

> cases where the MANOVA cannot be done -- too many y variables

> compared to the number of observations -- where the repeated

> measures ANOVA can still be computed.)

MANOVA is an interesting alternative in many situations. I will consider it

when appropriate. If I'm not mistaken though, the present analysis would not

run in MANOVA because it would mean 3*20 dependent variables and only 17

subjects. This is also typical for many of our experiments (and some of our

field studies) so ANOVA would still be our main approach.

Yes, but another list member has repeatedly stated that GLLAMM has limited ability for modeling the covariance structure. When you take that course, report back!

David Airey <david.airey@vanderbilt.edu>:

> As for me, the more I use Stata, the more I like it, but the more I

> mess around with statistics, the more tools I wind up exploring (Data

> Desk, Stata, and R, so far).

Agree. Stata is really growing on me. And this is of course part of my

problem. I want Stata to be able to do it all ... I don't want to spend time

in to many programs but I have realized that there are limits even to Stata.

Currently though, I have my hands full with learning Stata and LISREL. And

soon I will take a course in GLLAMM.

> For biologists using statistics, the main weaknesses of Stata are

> currently a lack of a routine like SAS Mixed or R LME/NLME

Mixed modeling is an area that I'm very interested in. I have no practical

experience of it but from what I've read it is the answer to many of my

problems. And that's why I will take some time to learn GLLAMM which as I

understand is the closest to Proc Mixed you can get in Stata.

> Please send me the data set if it's not private, and I will run on my

> Powerbook to compare times. I'm curious about this. I have a 1.25 GHz

> Powerbook.

Check you mail.

Finally, many thanks to Ken Higbee and David Airey your time and knowledge.

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Extremely poor performance in repeated ANOVA***From:*Michael Ingre <Michael.Ingre@ipm.ki.se>

- Prev by Date:
**st: Re: using the max command** - Next by Date:
**Re: st: assignment by indexing** - Previous by thread:
**Re: st: Extremely poor performance in repeated ANOVA** - Next by thread:
**Re: st: Extremely poor performance in repeated ANOVA** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |