[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: group size needed for mixed models (binary response)

From   Jeph Herrin <>
Subject   Re: st: group size needed for mixed models (binary response)
Date   Mon, 26 Nov 2007 20:59:26 -0500


I would say your doubts are well founded. There aren't really
enough fawns/doe to estimate a mixed model. I often run into
this problem with hospital admissions, where the same patient
can be admitted twice in 12 months for the same condition, but
most patients have just one admission.

First, you should estimate the intra-class correlation of survival
for each of your study groups (in Stata use -loneway-). If this
is not much different from zero, then you can ignore the grouping
and use standard logit.

If there is a group effect, then my instinct would be to select
a random fawn per mother as the analytic sample. If there is
reason to think that number of siblings might affect survival,
you can add a covariate for each retained fawn equal to the
number of siblings. Then again use standard logit, because you
are working with only one fawn/doe.

Hope this helps,

Susan Lingle wrote:
Dear Stata-Listers

My question is a statistical one, not anything specific to use of Stata.
From reading the archives, there are clearly many knowledgeable people
out there, and I am hoping someone can advise whether a mixed model is appropriate to use to analye my data.

I have a data set for deer fawns, in which I want to test whether fawns of one species, whitetails, are more likely to die from predation during summer than mule deer. I plan to run a separate analysis to test whether the other species, mule deer are more likely than whitetails to die during winter. For the summer sample, there are 129 whitetail fawns from 124 mothers and 207 mule deer from 177 mothers. For the winter sample, there are 26 whitetail fawns from 25 mothers, and 129 mule deer from 103 mothers. This means there is one measurement (live or die) for each fawn.

Someone strongly recommended that I use a mixed model with the mother’s identity as a random factor to analyse the survival data (e.g., xtmelogit in Stata). I certainly appreciate the value of including family effects as random factors when there is a large enough family to estimate those effects, or the variance associated with those effects. But in this case, most females have one fawn so the data appear insufficient to estimate random effects or the variance, and I believe the latter is needed to estimate an intercept.

I have searched far and wide for an answer. The closest thing I found, and it seems to make sense, is an article suggesting that a large group size (n=50) as well as a large number of groups (n=100) are needed for a mixed effects logistic regression to produce decent estimates of fixed effects as well as random effects (citation below). They found severe flaws when group size was less than 5. Apparently, the sample size issues are not as restrictive for linear models, although I get the impression one still would need more than n=5 for each group.

It is appropriate to use mixed models for binary response variables, or even for linear response variables, when the groups usually consist of 1 individuals and at most 2 individuals???

Can anyone advise? It would be greatly appreciated.


Article: R. Moineddin et al 2007. A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology 7:34.

* For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index