# Re: st: Advice on Analysis (unbalanced data)

 From Joseph Coveney To Statalist Subject Re: st: Advice on Analysis (unbalanced data) Date Tue, 17 Jul 2007 09:39:49 +0900

```Thomas J. Steichen wrote:

I have a difficult little dataset for which I could use some advice on how
to proceed.

In this data, samples of 3 items each of 3 Types (A, B, C) were
(destructively) analyzed "fresh" (i.e., unused) on a number of different
analyses.

Other samples of these items were given to Subjects (1-9) for use and were
subsequently returned for analysis. Subjects 1-3 used type A, subjects 4-6
used type B and subjects 7-9 used type C. Each subject could use multiple
items on each of 2 Days (not all did and actual days were not the same for
all subjects, i.e., day 1 for subject 1 may be a different calendar day than
day 1 for subject 2).

In variable Fresh, unused items were coded 1 and used items coded 0.

Thus, Fresh and Type are crossed, Subject is nested in Type, Day is nested
in Subject, and Rep is nested in Day.

Because of the destructive nature of the testing, if a subject used only a
small number of items, not all analyses could be carried out (resulting in
unbalanced cell counts).

For purposes of coding, the "fresh" items were assigned to a "dummy" subject
and day (coded as subject 0 and day 0), resulting in missing cells.

The question of interest is whether the yield for an analyte differs between
fresh and used items.

Here is an example of the replicate counts in each cell:

.. table subject type day, c(n r1) by(fresh) concise
--------------------------------------------------------------------
Fresh?    |                   Day and Type of item
and       | ---- Dummy 0 ---    ----- Day 1 ----    ----- Day 2 ----
Subject   |    A     B     C       A     B     C       A     B     C
----------+--------------------------------------------------------
Old       |
1 |                        2                   2
2 |                        1                   2
3 |                        1                   2
4 |                              3                   3
5 |                              2                   3
6 |                              2                   2
7 |                                    2                   2
8 |                                    2                   2
9 |                                    2                   2
----------+--------------------------------------------------------
Fresh     |
Dummy 0 |    3     3     3
--------------------------------------------------------------------

--------------------------------------------------------------------------------

It seems that a cell means model (Milliken & Johnson, Ch. 13, 1992) would be
the most straightforward approach for a null hypothesis about recovery of
analyte from fresh and used items.  At least I think that you could get away
with testing that hypothesis with the missing cell pattern shown.  You'd end
up testing the hypothesis with a contrast matrix, and because you have
repeated measurements (days and replicates), you have three error terms, and
so would need to construct the test using the method in the FAQ that Ken
Higbee just referred to.  Trying to get at the expected mean squares in the
unbalanced mixed model is probably more trouble than it's worth, and
so -xtmixed- might be a more amenable approach, again with a cell means
model and a linear combination of coefficients for testing the hypothesis.

Joseph Coveney

G. A. Milliken and D. E. Johnson, _Analysis of Messy Data. Volume I:
Designed Experiments_ (London: Chapman & Hall, 1992), esp. Ch. 13
(pp. 172--77), and Ch. 14 (pp. 178--90), with worked example in
Ch. 15 (pp. 191--95).

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```