Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Sample size for four-level logistic regression

 From Clyde Schechter <[email protected]> To [email protected] Subject st: Sample size for four-level logistic regression Date Thu, 20 Jun 2013 18:17:57 -0700

```I'm working with some colleagues to try to get a sense of the feasibility
of an idea we have for a study.

Our intervention would be randomized at the level of institutions, which
have a few levels of outcome-relevant internal hierarchy themselves.  The
outcome is dichotomous and is fairly rare: around 2.5 "successes" per 1,000
observations.  (Observations within institutions will be relatively
plentiful and inexpensive to obtain electronically, although limited by the
number of discharges per year they handle.  The limit on feasibility will
be the number of institutions, each of which will need resources  to
implement the intervention and program their data collection.)  Ultimately,
the analysis will require a 4-level logistic regression.

I need to get a sense of how many institutions would need to be recruited
for the study: if too large, it's a dead letter.

Were this a two- or three-level design with a continuous outcome, I would
use Optimal Design software.  Alternatively, for two level designs there
are simple approximation formulas relating the simple random sample size to
the size needed based on a design effect calculated from the number of
observations per higher-level unit and the intraclass correlation.

But I am unaware of any software that does sample size calculations for
references providing any quick formulas for a design-effect correction.

Plan A was to do simulations.  The problem is that in the simulations, each
replication (analysis of a single simulated sample) takes 2 hours to run on
my setup, even with the Laplace approximation.  For even one candidate
number of institutions and set of assumptions about variance components, I
will need about 500 replications to get reasonable precision on the power.
So we're talking months here.  And I was hoping to try several combinations
of assumed number of institutions and variance components.  Clearly a
non-starter.

I thought about treating the three top levels as if they were a single
level, in effect, ignoring the nesting that takes place within institutions
and doing a 2-level analysis.  That could be simulated quickly, but I have
no idea whether results for that would even vaguely resemble what is needed
for a four-level model.  I also considered a linear probability model based
on the much faster -xtmixed-, but given the very low event rate I doubt
such an approach would be reasonable.

By any chance, will the expanded sample size calculations supported in
Stata 13 handle this?  Or is its speedup in runtime for xtmelogit so great
that it will deliver me from this problem?  Stata 13 will be in my hands
before I can finish 500 reps of the simulation.

Anyone have any suggestions for a plan B?  A fully polished analysis ready
for submission to a granting agency is not needed at this time, but I need
enough information to know if this study idea is even worth pursuing.

Any help will be appreciated.

Clyde Schechter
Dept. of Family & Social Medicine
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```