# [no subject]

```Another big difference between the overparameterized and the cell
means approach is the size of the underlying design matrix (the
X'X matrix).  In a cell means approach the X'X matrix is smaller
(often much smaller) and of full rank -- no columns/rows need to
be dropped.  In the overparameterized model the X'X matrix has
redundancies built in that end up getting dropped out.  That is
why I was commenting on comparing the degrees of freedom versus
the number of columns used in the X'X matrix for that particular
example. The D*C*B*G|A term has 8 d.f.s but used 72 columns in
the X'X matrix (all but 8 of which end up getting dropped due to
collinearity with other terms in the model).

Consider an anova with factors A (with 3 levels) and B (with 4
levels)

|     B
| 1  2  3  4
------+-----------
1 | 1  2  3  4
A   2 | 5  6  7  8
3 | 9 10 11 12

There are 12 cells in this layout.

The overparameterized model that most people are familiar with
would be run by typing (assuming y is the dependent variable):

. anova y A B A*B

The design matrix and d.f.s would be

term    # of cols in X'X    df
------------------------------
_cons         1              1
A            3              2
B            4              3
A*B         12              6
------------------------------
total       20             12

There are 8 (= 20-12) columns/rows dropped due to collinearity.

The cell means ANOVA approach is

. tab A B, gen(cells)
. anova y cells, noconstant

This is just a oneway anova on the 12 cells that make up A and B.
The F-test for A B and A*B are not automatically provided, but
can be obtained using -test- with the -accum- option.  Individual
degree-of-freedom tests, however, are easy to think about and
form.

>> With your particular case it doesn't look like you can get a
>> S|A*B term (I am assuming A is crossed with B).  You say A has 20
>> levels and B has 2 and that there are 400 animals total.  Since
>> 20*2 = 400, I guess that means you have one animal per a A*B
>> combination.  So you will not be able to estimate a S|A*B term
>> separate from the A*B term.  Maybe you will drop the A*B term
>> (and assume that the A*B interaction is insignificant).
>
> Factor A is isogenic strain (all animals genetically the same within
> strain like twins or clones, but animals different between the 20
> strains). Factor B is sex. I have 10 animals per sex, both sexes per
> strain, so I should be able to get the term S|A*B, since I have 10
> animals per A*B combination. 20*2 = 40 A*B levels, I have 400 animals,
> so 10 per combination.

Oops.  In my message I said "20*2 = 400" -- duh!  You are fine --
as you say, you have 10 animals per A*B combo -- i.e., 20*2*10 = 400.

> Factors C, D, E, F, are drug treatment, test session period, stimulus
> character 1, stimulus character 2.
>
>> I commend the idea of creating an example dataset and doing a dry
>> run of your analysis before collecting the data.  This is helpful
>> in complicated designs to help point out limitations or problems
>> you might run into.  In some cases it might set you back to
>> rethinking how you want to design your experiment.
>
> In my case I'm fairly limited in being able to obtain 10 animals per
> sex per strain. Too expensive otherwise. So a within subject design
> seems necessary in some fashion. The only real concern I had, carry
> over effects of drug level (saline<->drugA<->drugB) were not a problem
> in another paper where order was counterbalanced by animal and a rest
> period between the three drug level test days was given. Of course, I
> don't claim to know it is the best design. But I do think dividing up
> the limited number of animals into a between group design will lack
> power.

You are probably doing very well with your design.  I was just
pointing out in general that running a proposed analysis on
contrived data can help point out unforseen problems.

I am reminded of my job as a graduate student providing
statistical consulting for graduate students in other scientific
fields who were working on their dissertation or thesis.  I
always felt very bad telling someone that they had spent a lot of
time (and possibly money) gathering data that wouldn't answer the
research question they had posed (usually due to confounding).
If they would have popped in for a consultation (usually provided
departments) before gathering their data, they would have saved
earlier).

Ken Higbee    khigbee@stata.com
StataCorp     1-800-STATAPC

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```