Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

[no subject]

Another big difference between the overparameterized and the cell
means approach is the size of the underlying design matrix (the
X'X matrix).  In a cell means approach the X'X matrix is smaller
(often much smaller) and of full rank -- no columns/rows need to
be dropped.  In the overparameterized model the X'X matrix has
redundancies built in that end up getting dropped out.  That is
why I was commenting on comparing the degrees of freedom versus
the number of columns used in the X'X matrix for that particular
example. The D*C*B*G|A term has 8 d.f.s but used 72 columns in
the X'X matrix (all but 8 of which end up getting dropped due to
collinearity with other terms in the model).

Consider an anova with factors A (with 3 levels) and B (with 4

          |     B
          | 1  2  3  4
        1 | 1  2  3  4
    A   2 | 5  6  7  8
        3 | 9 10 11 12

There are 12 cells in this layout.

The overparameterized model that most people are familiar with
would be run by typing (assuming y is the dependent variable):

    . anova y A B A*B

The design matrix and d.f.s would be

      term    # of cols in X'X    df
      _cons         1              1
       A            3              2
       B            4              3
       A*B         12              6
       total       20             12

There are 8 (= 20-12) columns/rows dropped due to collinearity.

The cell means ANOVA approach is

    . tab A B, gen(cells)
    . anova y cells, noconstant

This is just a oneway anova on the 12 cells that make up A and B.
The F-test for A B and A*B are not automatically provided, but
can be obtained using -test- with the -accum- option.  Individual
degree-of-freedom tests, however, are easy to think about and

>> With your particular case it doesn't look like you can get a
>> S|A*B term (I am assuming A is crossed with B).  You say A has 20
>> levels and B has 2 and that there are 400 animals total.  Since
>> 20*2 = 400, I guess that means you have one animal per a A*B
>> combination.  So you will not be able to estimate a S|A*B term
>> separate from the A*B term.  Maybe you will drop the A*B term
>> (and assume that the A*B interaction is insignificant).
> Factor A is isogenic strain (all animals genetically the same within 
> strain like twins or clones, but animals different between the 20 
> strains). Factor B is sex. I have 10 animals per sex, both sexes per 
> strain, so I should be able to get the term S|A*B, since I have 10 
> animals per A*B combination. 20*2 = 40 A*B levels, I have 400 animals, 
> so 10 per combination.

Oops.  In my message I said "20*2 = 400" -- duh!  You are fine --
as you say, you have 10 animals per A*B combo -- i.e., 20*2*10 = 400.

> Factors C, D, E, F, are drug treatment, test session period, stimulus 
> character 1, stimulus character 2.
>> I commend the idea of creating an example dataset and doing a dry
>> run of your analysis before collecting the data.  This is helpful
>> in complicated designs to help point out limitations or problems
>> you might run into.  In some cases it might set you back to
>> rethinking how you want to design your experiment.
> In my case I'm fairly limited in being able to obtain 10 animals per 
> sex per strain. Too expensive otherwise. So a within subject design 
> seems necessary in some fashion. The only real concern I had, carry 
> over effects of drug level (saline<->drugA<->drugB) were not a problem 
> in another paper where order was counterbalanced by animal and a rest 
> period between the three drug level test days was given. Of course, I 
> don't claim to know it is the best design. But I do think dividing up 
> the limited number of animals into a between group design will lack 
> power.

You are probably doing very well with your design.  I was just
pointing out in general that running a proposed analysis on
contrived data can help point out unforseen problems.

I am reminded of my job as a graduate student providing
statistical consulting for graduate students in other scientific
fields who were working on their dissertation or thesis.  I
always felt very bad telling someone that they had spent a lot of
time (and possibly money) gathering data that wouldn't answer the
research question they had posed (usually due to confounding).
If they would have popped in for a consultation (usually provided
for free by agreement between the different University
departments) before gathering their data, they would have saved
themselves a lot of time and headaches (and possibly graduated

Ken Higbee
StataCorp     1-800-STATAPC

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index