Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Clyde Schechter <clyde.schechter@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Sample size for four-level logistic regression |

Date |
Thu, 20 Jun 2013 18:17:57 -0700 |

I'm working with some colleagues to try to get a sense of the feasibility of an idea we have for a study. Our intervention would be randomized at the level of institutions, which have a few levels of outcome-relevant internal hierarchy themselves. The outcome is dichotomous and is fairly rare: around 2.5 "successes" per 1,000 observations. (Observations within institutions will be relatively plentiful and inexpensive to obtain electronically, although limited by the number of discharges per year they handle. The limit on feasibility will be the number of institutions, each of which will need resources to implement the intervention and program their data collection.) Ultimately, the analysis will require a 4-level logistic regression. I need to get a sense of how many institutions would need to be recruited for the study: if too large, it's a dead letter. Were this a two- or three-level design with a continuous outcome, I would use Optimal Design software. Alternatively, for two level designs there are simple approximation formulas relating the simple random sample size to the size needed based on a design effect calculated from the number of observations per higher-level unit and the intraclass correlation. But I am unaware of any software that does sample size calculations for four-level designs with dichotomous outcomes, and I have not found any references providing any quick formulas for a design-effect correction. Plan A was to do simulations. The problem is that in the simulations, each replication (analysis of a single simulated sample) takes 2 hours to run on my setup, even with the Laplace approximation. For even one candidate number of institutions and set of assumptions about variance components, I will need about 500 replications to get reasonable precision on the power. So we're talking months here. And I was hoping to try several combinations of assumed number of institutions and variance components. Clearly a non-starter. I thought about treating the three top levels as if they were a single level, in effect, ignoring the nesting that takes place within institutions and doing a 2-level analysis. That could be simulated quickly, but I have no idea whether results for that would even vaguely resemble what is needed for a four-level model. I also considered a linear probability model based on the much faster -xtmixed-, but given the very low event rate I doubt such an approach would be reasonable. By any chance, will the expanded sample size calculations supported in Stata 13 handle this? Or is its speedup in runtime for xtmelogit so great that it will deliver me from this problem? Stata 13 will be in my hands before I can finish 500 reps of the simulation. Anyone have any suggestions for a plan B? A fully polished analysis ready for submission to a granting agency is not needed at this time, but I need enough information to know if this study idea is even worth pursuing. Any help will be appreciated. Clyde Schechter Dept. of Family & Social Medicine * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Sample size for four-level logistic regression***From:*William Buchanan <william@williambuchanan.net>

**Re: st: Sample size for four-level logistic regression***From:*Jeph Herrin <stata@spandrel.net>

**Re: st: Sample size for four-level logistic regression***From:*Phil Schumm <pschumm@uchicago.edu>

- Prev by Date:
**Re: st: Verify that all values of a variable are mapped after -label values-** - Next by Date:
**Re: st: Testing interaction terms / mutually exclusive variables** - Previous by thread:
**st: creating a bar chart with overlayed symbols** - Next by thread:
**Re: st: Sample size for four-level logistic regression** - Index(es):