[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Coveney <jcoveney@bigplanet.com> |

To |
Statalist <statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: xtlogit and logistic-cluster (REVISED) |

Date |
Mon, 09 Aug 2004 19:36:57 +0900 |

Ricardo Ovaldia wrote: > I am a bit baffled by the assertion that 50 clusters > and 410 observations is a small sample size. I know is > not big, but I would not consider it small either. Whether 50 clusters and 410 total observations is small or not depends upon the task. Advocating exercising caution to assure that the sample size is adequate for the intended purpose is not asserting that a particular sample size is small. For population-average GEE, which is sensitive to cluster numbers, rules of thumb for sample size for ranges of predictors are given in M. E. Stokes, C. S. Davis & G. G. Koch, _Categorical Data Analysis Using the SAS System_ Second Edition. (Cary: N. Carolina: SAS Institute, 2000), p. 479. If you have many candidate predictors among those for patients and physicians, my guess is that the authors would say that 50 clusters is pretty dicey. I don't recall having recently run accross any corresponding guidance for random-effects logistic regression, which depends more upon within-cluster correlation and total observations. Can -simulate- tell you about the adequacy of the sample size for your purposes (e.g., for confidence interval coverage) in your particular dataset with the parameters set at their estimates? Generating a correlated binary variate to match the observed rho is tough, but you might be able to get reasonably close. If you're satisfied with the results of the simulation for the model's intended use, then the sample size is not too small. In a simple-minded illustration below, a sample size of 50 clusters, a uniform length (cluster size) of six observations and a moderate-to-high within-cluster correlation (rho is about 80% or so), the test size was 11.5% at the nominal 5% level of Type 1 error rate. That's more than double the nominal, and if the purpose is hypothesis testing, then the sample size would be considered small, too small given the nature of the data and the objective. This improves, of course, when there is no within-cluster correlation--in the simple example below it reduces to 6.7%, which is still substantially larger than nominal. But if this isn't critical for the objective, then the sample then would not necessarily be considered small. > The question posed in this phase of analysis is rather > simple: Which physician and patient characteristics > are important in predicting patient referral? Have you considered coupling modeling with graphical analysis at this phase? Strength and nature of the relationships observed graphically could be combined with knowledge of the subject matter to judge importance of predictors. Plots could be made of observations or of predictions from models after holding one or more covariates at reference values. If your audience doesn't feel comfortable judging the strength or importance of the relationship based upon what they can see by graphical presentation, then numerical description of the predictions can be done either with summary statistics (including tabulations) or by a model, perhaps with standardized coefficients if that makes it easier for your audience. For the next phase, the model can be made parsimonious based upon what's observed in the plots or what's judged unimportant in earlier stages of exploration. It might be beneficial to use two models to describe your observations: one, a conditional logistic regression with physicians as groups, to describe patient characteristics that predict referral; the other, a count model, to describe physician characteristics that predict referral rates. Joseph Coveney ---------------------------------------------------------------------------- clear set more off set seed 20040809 set obs 6 forvalues i = 1/6 { generate float rho`i' = 0.8 replace rho`i' = 1 in `i' } mkmat rho*, matrix(A) * program define xtlogitsimc, rclass version 8.2 drawnorm dep1 dep2 dep3 dep4 dep5 dep6, corr(A) n(50) clear generate byte pid = _n generate byte trt = _n > _N / 2 reshape long dep, i(pid) j(tim) replace dep = dep > 0 compress xi: xtlogit dep trt i.tim, i(pid) re estimates store A xtlogit dep, i(pid) re estimates store B lrtest A B return scalar p = r(p) end * simulate "xtlogitsimc" p = r(p), reps(1000) generate byte pos = p < 0.05 replace pos = . if p >= . summarize pos * * * program define xtlogitsimi, rclass version 8.2 replace dep = uniform() > 0.5 xi: xtlogit dep trt i.tim, i(pid) re estimates store A xtlogit dep, i(pid) re estimates store B lrtest A B return scalar p = r(p) estimates drop _all end * clear set obs 50 generate byte pid = _n generate byte trt = _n > _N / 2 forvalues i = 1/6 { generate byte dep`i' = . } reshape long dep, i(pid) j(tim) simulate "xtlogitsimi" p = r(p), reps(1000) generate byte pos = p < 0.05 replace pos = . if p >= . summarize pos exit * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: matching script for design of case-control data?***From:*"Marcus Fischer" <marcus.fischer@gmx.com>

**Re: st: Re: xtlogit and logistic-cluster (REVISED)***From:*Ricardo Ovaldia <ovaldia@yahoo.com>

- Prev by Date:
**Re: st: Mac OS X text editor with Stata syntax** - Next by Date:
**Re: st: Autocorrelation and IV in Panel data** - Previous by thread:
**re: st: Re: xtlogit and logistic-cluster (REVISED)** - Next by thread:
**Re: st: Re: xtlogit and logistic-cluster (REVISED)** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |