# st: Advice on Analysis (unbalanced data)

 From "Steichen, Thomas J." To Subject st: Advice on Analysis (unbalanced data) Date Mon, 16 Jul 2007 16:54:30 -0400

```I have a difficult little dataset for which I could use some advice on how to proceed.

In this data, samples of 3 items each of 3 Types (A, B, C) were (destructively) analyzed
"fresh" (i.e., unused) on a number of different analyses.

Other samples of these items were given to Subjects (1-9) for use and were subsequently
returned for analysis. Subjects 1-3 used type A, subjects 4-6 used type B and subjects
7-9 used type C. Each subject could use multiple items on each of 2 Days (not all did
and actual days were not the same for all subjects, i.e., day 1 for subject 1 may be a
different calendar day than day 1 for subject 2).

In variable Fresh, unused items were coded 1 and used items coded 0.

Thus, Fresh and Type are crossed, Subject is nested in Type, Day is nested in Subject,
and Rep is nested in Day.

Because of the destructive nature of the testing, if a subject used only a small number
of items, not all analyses could be carried out (resulting in unbalanced cell counts).

For purposes of coding, the "fresh" items were assigned to a "dummy" subject and day
(coded as subject 0 and day 0), resulting in missing cells.

The question of interest is whether the yield for an analyte differs between fresh and
used items.

Here is an example of the replicate counts in each cell:

.. table subject type day, c(n r1) by(fresh) concise
--------------------------------------------------------------------
Fresh?    |                   Day and Type of item
and       | ---- Dummy 0 ---    ----- Day 1 ----    ----- Day 2 ----
Subject   |    A     B     C       A     B     C       A     B     C
----------+---------------------------------------------------------
Old       |
1 |                        2                   2
2 |                        1                   2
3 |                        1                   2
4 |                              3                   3
5 |                              2                   3
6 |                              2                   2
7 |                                    2                   2
8 |                                    2                   2
9 |                                    2                   2
----------+---------------------------------------------------------
Fresh     |
Dummy 0 |    3     3     3
--------------------------------------------------------------------

As stated above, any advice on appropriate analyses for this data would be
appreciated.

One can easily argue that the analysis problems arise out of the data structure.
If someone can propose a more useful way to code this data, that would be of
interest to me too.

Below is the data and some code to put it into Stata format, plus some code to
plot each response and some potential analyses.

I would appreciate knowing how you might attack it.

Tom

* create data set
input fresh type subject day rep r1 r2 r3 r4 r5 r6 r7 r8 r9 r10
1 1 0 0 1 5.83 .311 .119 .261 .088 .17 .30 .08 .49 .39
1 1 0 0 2 5.84 .307 .107 .255 .092 .17 .30 .12 .55 .38
1 1 0 0 3 6.24 .323 .098 .278 .100 .19 .29 .09 .58 .34
0 1 1 1 1 0.91 .053 .017 .058 .041 .13 .30 .08 .34 .
0 1 1 1 2 1.26 . . . . . . . . .
0 1 1 2 1 0.78 .048 .009 .038 . . . . . .
0 1 1 2 2 0.65 . . . . . . . . .
0 1 2 1 1 0.65 .064 .025 .073 .054 .13 .25 .09 .29 .36
0 1 2 2 1 0.86 .072 .024 .067 .031 .12 .29 .08 .31 .37
0 1 2 2 2 0.51 . . . . . . . . .
0 1 3 1 1 1.63 .141 .031 .100 . . . . . .
0 1 3 2 1 1.85 .148 .034 .095 .081 .15 .29 .09 .52 .
0 1 3 2 2 1.62 . . . . . . . . .
1 2 0 0 1 5.99 .251 .049 .199 .095 .17 .25 .07 .47 .42
1 2 0 0 2 5.53 .252 .048 .209 .071 .18 .31 .08 .49 .37
1 2 0 0 3 5.82 .239 .045 .189 .108 .19 .25 .09 .47 .41
0 2 4 1 1 2.08 .138 .025 .106 .052 .14 .28 .08 .38 .37
0 2 4 1 2 2.00 . . . .045 .14 .29 .09 .40 .35
0 2 4 1 3 2.04 . . . . . . . . .
0 2 4 2 1 1.50 .116 .026 .095 .035 .13 .27 .07 .33 .42
0 2 4 2 2 1.65 . . . .055 .15 .29 .09 .38 .
0 2 4 2 3 1.33 . . . .059 .15 .25 .08 .32 .
0 2 5 1 1 1.68 .150 .031 .118 .071 .14 .26 .08 .40 .35
0 2 5 1 2 1.65 . . . . . . . . .
0 2 5 2 1 1.39 .120 .027 .103 .048 .12 .28 .24 .31 .38
0 2 5 2 2 1.47 . . . .045 .14 .29 .08 .36 .
0 2 5 2 3 1.28 . . . . . . . . .
0 2 6 1 1 2.48 .189 .039 .189 .057 .15 .30 .09 .49 .27
0 2 6 1 2 3.03 . . . . . . . . .
0 2 6 2 1 2.34 .196 .042 .196 .121 .17 .26 .09 .47 .25
0 2 6 2 2 3.33 . . . .070 .17 .33 .08 .52 .
1 3 0 0 1 5.77 .294 .095 .253 .088 .18 .29 .08 .48 .41
1 3 0 0 2 5.52 .294 .092 .215 .102 .19 .28 .08 .52 .39
1 3 0 0 3 5.63 .306 .087 .241 .106 .18 .23 .07 .46 .42
0 3 7 1 1 1.17 .115 .034 .115 .037 .12 .25 .06 .24 .29
0 3 7 1 2 0.67 . . . .043 .12 .22 .06 .26 .
0 3 7 2 1 0.58 .065 .020 .065 .059 .16 .27 .09 .35 .26
0 3 7 2 2 0.54 . . . .042 .12 .26 .07 .26 .
0 3 8 1 1 2.34 .207 .054 .207 .114 .18 .26 .08 .47 .
0 3 8 1 2 2.18 . . . .102 .17 .22 .07 .42 .
0 3 8 2 1 2.38 .200 .057 .200 .073 .15 .29 .07 .42 .
0 3 8 2 2 2.14 . . . .101 .15 .23 .07 .40 .
0 3 9 1 1 1.91 .159 .051 .159 .073 .14 .23 .07 .33 .31
0 3 9 1 2 1.45 . . . . . . . . .
0 3 9 2 1 1.61 .154 .052 .154 .061 .17 .28 .07 .41 .30
0 3 9 2 2 1.64 . . . . . . . . .
end
compress
lab var fresh "Fresh?"
lab def fresh 1 "Fresh" 0 "Old"
lab val fresh fresh
lab var type "Type of item"
lab def type 1 "A" 2 "B" 3 "C"
lab val type type
lab var subject "Subject"
lab def subject 0 "Dummy 0"
lab val subject subject
lab var day "Day"
lab def day 0 "Dummy 0" 1 "Day 1" 2 "Day 2"
lab val day day
lab var rep "Replicate"
lab var r1 "Response 1"
lab var r2 "Response 2"
lab var r3 "Response 3"
lab var r4 "Response 4"
lab var r5 "Response 5"
lab var r6 "Response 6"
lab var r7 "Response 7"
lab var r8 "Response 8"
lab var r9 "Response 9"
lab var r10 "Response 10"

* plot the data...
foreach var of varlist r1-r10 {
tw   scatter `var' subject if type == 1, sym(Oh) mcol(blue)  ///
|| scatter `var' subject if type == 2, sym(Sh) mcol(red)   ///
|| scatter `var' subject if type == 3, sym(Dh) mcol(green) ///
, legend(order(1 "A" 2 "B" 3 "C") rows(1) pos(1) ring(0))
more
}

* potential analyses
foreach var of varlist r1-r10 {
anova   `var' fresh type  / subject|type / day|subject|type /
xtmixed `var' fresh type || subject:    || day:
more
}

-----------------------------------
Thomas J. Steichen
steicht@rjrt.com
-----------------------------------

-----------------------------------------
CONFIDENTIALITY NOTE: This e-mail message, including any
attachment(s), contains information that may be confidential,
protected by the attorney-client or other legal privileges, and/or
proprietary non-public information. If you are not an intended
recipient of this message or an authorized assistant to an intended
then delete it from your system. Use, dissemination, distribution,
or reproduction of this message and/or any of its attachments (if
any) by unintended recipients is not authorized and may be
unlawful.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```