Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Sample size for four-level logistic regression

From   Phil Schumm <>
To   <>
Subject   Re: st: Sample size for four-level logistic regression
Date   Fri, 21 Jun 2013 12:20:19 -0500

On Jun 20, 2013, at 8:17 PM, Clyde Schechter <> wrote:
> Our intervention would be randomized at the level of institutions, which have a few levels of outcome-relevant internal hierarchy themselves.  The outcome is dichotomous and is fairly rare: around 2.5 "successes" per 1,000 observations.  (Observations within institutions will be relatively plentiful and inexpensive to obtain electronically, although limited by the number of discharges per year they handle.  The limit on feasibility will be the number of institutions, each of which will need resources  to implement the intervention and program their data collection.)  Ultimately, the analysis will require a 4-level logistic regression.


> But I am unaware of any software that does sample size calculations for four-level designs with dichotomous outcomes, and I have not found any references providing any quick formulas for a design-effect correction.


> A fully polished analysis ready for submission to a granting agency is not needed at this time, but I need enough information to know if this study idea is even worth pursuing.

If it were me, I would start by looking at a logistic regression of the proportion of "successes" for each institution on an indicator for the intervention (i.e., in which the institutions are the level of the analysis, the outcome is the proportion of successes, and there is a single binary covariate corresponding to the intervention).  To the extent that you expect correlation within an institution, you would need to account for this either by using a single dispersion parameter (i.e., using -glm-) or by using the robust variance estimate (i.e., using -blogit-).  If you do the former, then you can do the calculation analytically (i.e., for a given within-institution correlation, just plug in the corresponding value of the over-dispersion parameter), while if you do the latter, a simple simulation would probably be easiest (e.g., add an institution-level random effect when generating the observed proportions).  If your power for detecting the intervention is miserable withi!
 n the context of this simple model, then that should be a pretty good indication that you need to rethink.  Even if you expect to increase your precision by modeling some of the within-institution factors, I wouldn't want to count on that unless I had *very good* information about how they are distributed and how they are related to the outcome.

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index