Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: GLM and ANOVA complaints

From   Joseph Coveney <>
To   Statalist <>
Subject   Re: st: GLM and ANOVA complaints
Date   Sun, 28 Sep 2003 14:22:23 +0900

David Airey mentions that his factorial repeated-measures analysis of variance 
is taking more than eight hours to finish.  David describes his analysis as 
"384 observations, 2 between subject factors, 4 within subject factors," which 
if it's balanced would be a 2  2  2  2  2  2 repeated-measures ANOVA, with 
last four factors as repeated measurements.  I know that -anova- can take a 
while to complete when there are numerous factors and their interactions to 
estimate, but eight hours seems long to me for a problem of this size.

To see how rapidly this analysis would run on my laptop (2 GHz nominal, 512 
megabytes RAM, Windows XP), I created an artificial dataset that mimics David's 
in what I understand as his experimental design.  The do-file is attached 
below.  For reference, the between-subject factors are named prt (pretreatment) 
and trt (treatment), and the within-subject factors are named alphabetically.  
The statistical model of the data was fully saturated, that is, with all 
interaction terms, and I believe although am not certain that I specified it 
correctly.  -anova- took 25 minutes including floppy disc access time to 
log the output.  This is longer, of course, than the 30 seconds claimed for 
SAS's PROC MIXED, but not hours longer.  I did not use (need) a matrix size of 
6000, but I doubt that it would have substantially increased the computation 
time if I did set the matrix size limit that large.

Joseph Coveney


set more off
set matsize 2400
set obs 384
set seed 20030928
* First between-subject factor (pretreatment)
generate byte prt = _n > _N / 2
* Second between-subject factor (treatment)
sort prt
generate byte trt = mod(_n, 2)
* Subject identifier
sort trt prt
generate byte pid = mod(_n, 16) == 1
replace pid = sum(pid)
tabulate prt trt
* Balanced completely randomized factorial design
* First within-subjects factor
sort pid // Not really necessary
generate byte A = mod(_n, 2)
* Second within-subjects factor
sort pid A
generate byte B = mod(_n, 2)
* Third within-subjects factor
sort pid A B
generate byte C = mod(_n, 2)
* Fourth within-subjects factor
sort pid A B C
generate byte D = mod(_n, 2)
sort pid A B C D
by pid: generate float latent_variable = invnorm(uniform()) if _n == 1
by pid: replace latent_variable = latent_variable[1]
generate float dep = 0.7 * latent_variable + (1 - 0.7^2) * invnorm(uniform())
drop latent_variable
* Strictly additive (no interactions of any factors)
replace dep = dep - prt / 6 + trt / 6 - A / 6 + B / 6 - C / 6 + D / 6
capture log close
log using complicated_anova.smcl, replace
set rmsg on
anova dep prt trt prt*trt / prt*trt|pid ///
  A prt*A trt*A prt*trt*A / prt*trt*A|pid ///
  B prt*B trt*B prt*trt*B / prt*trt*B|pid ///
    A*B prt*A*B trt*A*B prt*trt*A*B / prt*trt*A*B|pid ///
  C prt*C trt*C prt*trt*C / prt*trt*C|pid ///
    A*C prt*A*C trt*A*C prt*trt*A*C prt*trt*A*C|pid ///
    B*C prt*B*C trt*B*C prt*trt*B*C / prt*trt*B*C|pid ///
    A*B*C prt*A*B*C trt*A*B*C prt*trt*A*B*C / prt*trt*A*B*C|pid ///
  D prt*D trt*D prt*trt*D / prt*trt*D|pid ///
    A*D prt*A*D trt*A*D prt*trt*A*D / prt*trt*A*D|pid ///
    B*D prt*B*D trt*B*D prt*trt*B*D / prt*trt*B*D|pid ///
    C*D prt*C*D trt*C*D prt*trt*C*D / prt*trt*C*D|pid ///
    A*B*D prt*A*B*D trt*A*B*D prt*trt*A*B*D / prt*trt*A*B*D|pid ///
    A*C*D prt*A*C*D trt*A*C*D prt*trt*A*C*D / prt*trt*A*C*D|pid ///
    B*C*D prt*B*C*D trt*B*C*D prt*trt*B*C*D / prt*trt*B*C*D|pid ///
  A*B*C*D prt*A*B*C*D trt*A*B*C*D prt*trt*A*B*C*D
log close
help smileplot


*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index