[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Advice on Analysis (unbalanced data)

From   Joseph Coveney <>
To   Statalist <>
Subject   Re: st: Advice on Analysis (unbalanced data)
Date   Tue, 17 Jul 2007 09:39:49 +0900

Thomas J. Steichen wrote:

I have a difficult little dataset for which I could use some advice on how
to proceed.

In this data, samples of 3 items each of 3 Types (A, B, C) were
(destructively) analyzed "fresh" (i.e., unused) on a number of different

Other samples of these items were given to Subjects (1-9) for use and were
subsequently returned for analysis. Subjects 1-3 used type A, subjects 4-6
used type B and subjects 7-9 used type C. Each subject could use multiple
items on each of 2 Days (not all did and actual days were not the same for
all subjects, i.e., day 1 for subject 1 may be a different calendar day than
day 1 for subject 2).

In variable Fresh, unused items were coded 1 and used items coded 0.

Thus, Fresh and Type are crossed, Subject is nested in Type, Day is nested
in Subject, and Rep is nested in Day.

Because of the destructive nature of the testing, if a subject used only a
small number of items, not all analyses could be carried out (resulting in
unbalanced cell counts).

For purposes of coding, the "fresh" items were assigned to a "dummy" subject
and day (coded as subject 0 and day 0), resulting in missing cells.

The question of interest is whether the yield for an analyte differs between
fresh and used items.

Here is an example of the replicate counts in each cell:

.. table subject type day, c(n r1) by(fresh) concise
Fresh?    |                   Day and Type of item
and       | ---- Dummy 0 ---    ----- Day 1 ----    ----- Day 2 ----
Subject   |    A     B     C       A     B     C       A     B     C
Old       |
       1 |                        2                   2
       2 |                        1                   2
       3 |                        1                   2
       4 |                              3                   3
       5 |                              2                   3
       6 |                              2                   2
       7 |                                    2                   2
       8 |                                    2                   2
       9 |                                    2                   2
Fresh     |
 Dummy 0 |    3     3     3


It seems that a cell means model (Milliken & Johnson, Ch. 13, 1992) would be
the most straightforward approach for a null hypothesis about recovery of
analyte from fresh and used items.  At least I think that you could get away
with testing that hypothesis with the missing cell pattern shown.  You'd end
up testing the hypothesis with a contrast matrix, and because you have
repeated measurements (days and replicates), you have three error terms, and
so would need to construct the test using the method in the FAQ that Ken
Higbee just referred to.  Trying to get at the expected mean squares in the
unbalanced mixed model is probably more trouble than it's worth, and
so -xtmixed- might be a more amenable approach, again with a cell means
model and a linear combination of coefficients for testing the hypothesis.

Joseph Coveney

G. A. Milliken and D. E. Johnson, _Analysis of Messy Data. Volume I:
Designed Experiments_ (London: Chapman & Hall, 1992), esp. Ch. 13
(pp. 172--77), and Ch. 14 (pp. 178--90), with worked example in
Ch. 15 (pp. 191--95).

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index