Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: xtlogit and logistic-cluster (REVISED)

From   Joseph Coveney <[email protected]>
To   Statalist <[email protected]>
Subject   st: Re: xtlogit and logistic-cluster (REVISED)
Date   Sun, 08 Aug 2004 20:29:00 +0900

David Airey wrote:

But when stuck with a small data set, why not run a model designed for 
that data structure, as opposed to running a model not designed for the 
data structure? When does ignoring the clustering become more favorable 
to acknowledging the presence of fewer than an optimal number clusters? 
Why is it not the case that a good model on a small data set is not 
always better than a bad model on the same small data set? I hope I'm 


Both the population-average GEE and subject-specific maximum likelihood 
approaches are considered "large-sample," and that was the basis of my 
suggesting caution with Ricardo's sample size of 50.  There wasn't any 
intention to suggest as alternatives either attempting to fit a model that 
isn't suited to the task or ignoring clustering.

It might be helpful to conceptualize the objectives of modeling in an 
exploratory-confirmatory dichotomy or continuum.  On the one hand, modeling 
could be used in exploring data in the hope that the exercise will provide 
insight.  Some would argue that model-fitting is tendentious for this.  
Exploratory use would also include using models to concisely describe insights 
gleaned from other exploratory methods.  An example of this came up on the list 
last month in the use of -ologit- with an interaction term to describe 
numerically (and perhaps lend corroboration to) what is observed in -ordplot- 
or -distplot-.  On the other hand, modeling can be used to estimate parameters 
and make formal statements about them, including confidence interval 
construction and hypothesis testing.  In this latter usage, attention to sample 
size requirements (and other assumptions) would be especially important, 
although I wouldn't throw caution to the wind in exploratory usage, either.

Ricardo didn't mention what the objective is of his usage.  If it involves the 
latter type, I could imagine a reviewer--either a journal referee or a 
regulatory agency reviewer--answering David's question in stating that a good 
model on a small dataset is not better than a bad model on the same dataset 
when the sample size is not sufficient for the good model's intended use.  
And I wouldn't count on a pre-emptive mea culpa plea acknowledging the presence 
of fewer than an optimal (adequate) number of clusters to get me off the hook 
in this situation.


Ricardo Ovaldia wrote:

. . . I have a couple of follow-up questions:

> If there
> is a substantial correlation between the fixed
> effects (physician covariates) and the random 
> effect, then the parameters are liable not to be
> consistently estimated.

How can I test this?

. . . I guess the question that remains is whether or not I
can justifiably use this approach?


Hausman's test, which Stata has in -hausman- or -xthausman-, is what I am aware 
of to test this assumption in linear models.  Others on the list might be able 
to help you, but I'm not familiar enough with how this assumption is evaluated 
in nonlinear models, like logistic regression, where devising the proper 
comparison fixed-effects model is tricky.  Is is possible to test the 
assumption for patient predictors using Hausman's test (using -hausman- or 
-suest-) with -clogit- (consistent) against -xtlogit, re- (efficient / 

You might consider fitting the model using -gllamm-, perhaps without the 
predictor in question, generating the random effects predictions using 
-gllapred- and examining scatterplots between them and the various predictors. 
I'm unaware of anyone in-the-know suggesting this approach, and so I suspect 
that it would have difficulty withstanding scrutiny.

The preferred approach might be to base the assumption's tenability externally 
upon knowledge of the area, relegating testing of the assumption to the same 
status as use of Levene's test or Bartlett's test prior to ANOVA.  Ricardo's 
data seem to come from an observational study.  Others on the list can speak 
much more authoritatively, but it's been my impression that random-effects 
regression is not especially favored in such circumstances, because the 
assumption that random effects and predictors are uncorrelated is likely to be 
violated in them and a failure to reject the null hypothesis in the Hausman 
test would only be cold comfort.

In ignorance about Ricardo's objectives and the subject matter, about the only 
suggestions that come to mind about whether the approach is justifiable is to 
consider the assumptions made in the process.  In addition to the two already 
mentioned, another would be the degree to which physicians (or their 
predictors) predict patient predictors.

Joseph Coveney

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index