Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: matsize


From   David Airey <david.airey@vanderbilt.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: matsize
Date   Mon, 18 Aug 2003 10:53:28 -0500

David Airey <david.airey@vanderbilt.edu> asks:

> How do you know before running a model, what the matsize needs to be?
> Is there an exact size you can determine beforehand?
>
> I'm going to work with a complex ANOVA and want to figure out the
> matsize needed, to see if Stata 8/SE can handle it, if I need to
> simplify the design, or if I need to purchase additional RAM. My
> computer can have 1000 mb max.
>
> ANOVA model:
>
> between subject factors:
> A: 20 levels
> B: 2 levels
>
> random subject factor nested in A and B:
> S: 400 animals total, 20 per A level, 10 per B level
>
> within subject factors (all crossed):
> C: 3 levels
> D: 4 levels
> E: 3 levels
> F: 2 levels
>
> In the mean time, I'm working out EMSs and a fabricated data set to see
> what happens on my machine.
Ken kindly replied:

For the benefit of others (I know David has seen it already) --
look at the example at

    http://www.stata.com/support/faqs/stat/anova2.html#expand911

which shows a complicated repeated measures ANOVA.  In that
particular case, I mentioned (without supporting justification)
that I needed to set matsize to 449 to run that particular
-anova-.  Where did that number come from?

        1    the constant
    +   2    A with 2 levels
    +   4    G|A with a total of 4 levels (2*2)
    +   2    B with 2 levels
    +   4    B*A (2*2)
    +   8    B*G|A (2*2*2)
    +  16    S|B*G|A (2*2*2*2)
    +   3    C with 3 levels
    +   6    C*A (3*2)
    +  12    C*G|A (3*2*2)
    +   6    C*B (3*2)
    +  12    C*B*A (3*2*2)
    +  24    C*B*G|A (3*2*2*2)
    +  48    C*S|B*G|A (3*2*2*2*2)
    +   3    D with 3 levels
    +   6    D*A (3*2)
    +  12    D*G|A (3*2*2)
    +   6    D*B (3*2)
    +  12    D*B*A (3*2*2)
    +  24    D*B*G|A (3*2*2*2)
    +  48    D*S|B*G|A (3*2*2*2*2)
    +   9    D*C (3*3)
    +  18    D*C*A (3*3*2)
    +  36    D*C*G|A (3*3*2*2)
    +  18    D*C*B (3*3*2)
    +  36    D*C*B*A (3*3*2*2)
    +  72    D*C*B*G|A (3*3*2*2*2)
    -----
    = 448

At this moment, I don't remember if I actually needed 449 (as I
claimed in the FAQ or 448 as computed above).  I think the 448
should be large enough.

You can follow the same exercise for your example and possibly
add 1 just for safe measure.  Write down your model and then for
each term multiply the number of levels for each factor in the
term, then add them all up.
Now that is useful to see! Thank you.

When you compare the numbers from doing this to the degrees of
freedom for each of the terms, it becomes clear real quickly why
they call it the "overparameterized ANOVA model".
I keep hearing this term, but I don't get the importance. A bad thing?

With your particular case it doesn't look like you can get a
S|A*B term (I am assuming A is crossed with B).  You say A has 20
levels and B has 2 and that there are 400 animals total.  Since
20*2 = 400, I guess that means you have one animal per a A*B
combination.  So you will not be able to estimate a S|A*B term
separate from the A*B term.  Maybe you will drop the A*B term
(and assume that the A*B interaction is insignificant).
Factor A is isogenic strain (all animals genetically the same within strain like twins or clones, but animals different between the 20 strains). Factor B is sex. I have 10 animals per sex, both sexes per strain, so I should be able to get the term S|A*B, since I have 10 animals per A*B combination. 20*2 = 40 A*B levels, I have 400 animals, so 10 per combination.

Factors C, D, E, F, are drug treatment, test session period, stimulus character 1, stimulus character 2.


I commend the idea of creating an example dataset and doing a dry
run of your analysis before collecting the data.  This is helpful
in complicated designs to help point out limitations or problems
you might run into.  In some cases it might set you back to
rethinking how you want to design your experiment.
In my case I'm fairly limited in being able to obtain 10 animals per sex per strain. Too expensive otherwise. So a within subject design seems necessary in some fashion. The only real concern I had, carry over effects of drug level (saline<->drugA<->drugB) were not a problem in another paper where order was counterbalanced by animal and a rest period between the three drug level test days was given. Of course, I don't claim to know it is the best design. But I do think dividing up the limited number of animals into a between group design will lack power.

-Dave

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index