Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: GLM and ANOVA complaints

From   David Airey <>
Subject   Re: st: GLM and ANOVA complaints
Date   Sat, 27 Sep 2003 13:32:00 -0500

David Airey referred to a book by R. B. Darlington in which SAS's PROC GLM was
described as being able to provide different specifications of indicator
variables. I am not familiar with R. B. Darlington's reference or with the
phrase "ANOVA effect parameters," but after posting a response earlier, it
dawned on me that David might be referring to specifying the contrasts in terms
of sum-to-zero dummy variables in lieu of 0/1 dummy variables. If so, then
this is what -desmat- and other sources refer to as deviance contrasts.
There's a description of this in S. Rabe-Hesketh & B. Everitt, _A Handbook of
Statistical Analyses Using Stata, 2nd. Edition. (Boca Raton: Chapman &
Hall/CRC, 2000), pp. 72-75, among other places.
Joseph kindly suggested a user created program _desmat_ as a solution to my first complaint, which was that glm seems to require the user to specify the underlying indicator variables in multicategorical variables. No, it's not that hard to do this on one's own, but that is tedious and prone to error. ANOVA/MANOVA in Stata creates the underlying variables as needed. Why does GLM not in Stata? It should.

What Darlington said (p. 225, Regression and Linear Models, 1990) was,

"The ability to create a whole set of coded variables with a single command is the fundamental distinction between an ordinary regression program and a general linear model program. We will use the term GLM to refer to these programs as a class. Although we think of a multicategorical variable as a single variable, the GLM program treats it as a set of variables."

My complaint was that glm in Stata (it seems to me) is not smart enough on it's own to generate the needed underlying indicator variables to represent multicategorical variables. It is too much like regression in it's syntax requirements for variables. And xi just handles up to two way interactions, though xi can be used with other commands.

Regarding the other complaint, my work computer has spent 8 hours so far on an ANOVA (384 observations, 2 between subject factors, 4 within subject factors, matsize 6000). A statistician at work ran an equivalent data set (so he said) in 30 seconds using SAS Proc Mixed. Once my machine finishes, I'm handing him the same data set and asking he run it with an equivalent model in SAS GLM and SAS Mixed. Should be interesting to compare. I just cannot imagine the approaches were the same with such different timings.


* For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index