[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Coveney <jcoveney@bigplanet.com> |

To |
Statalist <statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: xtlogit and logistic-cluster (REVISED) |

Date |
Sun, 08 Aug 2004 20:29:00 +0900 |

David Airey wrote: But when stuck with a small data set, why not run a model designed for that data structure, as opposed to running a model not designed for the data structure? When does ignoring the clustering become more favorable to acknowledging the presence of fewer than an optimal number clusters? Why is it not the case that a good model on a small data set is not always better than a bad model on the same small data set? I hope I'm clear. ------------------------------------------------------------------------------- Both the population-average GEE and subject-specific maximum likelihood approaches are considered "large-sample," and that was the basis of my suggesting caution with Ricardo's sample size of 50. There wasn't any intention to suggest as alternatives either attempting to fit a model that isn't suited to the task or ignoring clustering. It might be helpful to conceptualize the objectives of modeling in an exploratory-confirmatory dichotomy or continuum. On the one hand, modeling could be used in exploring data in the hope that the exercise will provide insight. Some would argue that model-fitting is tendentious for this. Exploratory use would also include using models to concisely describe insights gleaned from other exploratory methods. An example of this came up on the list last month in the use of -ologit- with an interaction term to describe numerically (and perhaps lend corroboration to) what is observed in -ordplot- or -distplot-. On the other hand, modeling can be used to estimate parameters and make formal statements about them, including confidence interval construction and hypothesis testing. In this latter usage, attention to sample size requirements (and other assumptions) would be especially important, although I wouldn't throw caution to the wind in exploratory usage, either. Ricardo didn't mention what the objective is of his usage. If it involves the latter type, I could imagine a reviewer--either a journal referee or a regulatory agency reviewer--answering David's question in stating that a good model on a small dataset is not better than a bad model on the same dataset when the sample size is not sufficient for the good model's intended use. And I wouldn't count on a pre-emptive mea culpa plea acknowledging the presence of fewer than an optimal (adequate) number of clusters to get me off the hook in this situation. -------------------------------------------------------------------------------- Ricardo Ovaldia wrote: . . . I have a couple of follow-up questions: > If there > is a substantial correlation between the fixed > effects (physician covariates) and the random > effect, then the parameters are liable not to be > consistently estimated. How can I test this? . . . I guess the question that remains is whether or not I can justifiably use this approach? ------------------------------------------------------------------------------- Hausman's test, which Stata has in -hausman- or -xthausman-, is what I am aware of to test this assumption in linear models. Others on the list might be able to help you, but I'm not familiar enough with how this assumption is evaluated in nonlinear models, like logistic regression, where devising the proper comparison fixed-effects model is tricky. Is is possible to test the assumption for patient predictors using Hausman's test (using -hausman- or -suest-) with -clogit- (consistent) against -xtlogit, re- (efficient / consistent-under-the-null)? You might consider fitting the model using -gllamm-, perhaps without the predictor in question, generating the random effects predictions using -gllapred- and examining scatterplots between them and the various predictors. I'm unaware of anyone in-the-know suggesting this approach, and so I suspect that it would have difficulty withstanding scrutiny. The preferred approach might be to base the assumption's tenability externally upon knowledge of the area, relegating testing of the assumption to the same status as use of Levene's test or Bartlett's test prior to ANOVA. Ricardo's data seem to come from an observational study. Others on the list can speak much more authoritatively, but it's been my impression that random-effects regression is not especially favored in such circumstances, because the assumption that random effects and predictors are uncorrelated is likely to be violated in them and a failure to reject the null hypothesis in the Hausman test would only be cold comfort. In ignorance about Ricardo's objectives and the subject matter, about the only suggestions that come to mind about whether the approach is justifiable is to consider the assumptions made in the process. In addition to the two already mentioned, another would be the degree to which physicians (or their predictors) predict patient predictors. Joseph Coveney * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: xtlogit and logistic-cluster (REVISED)***From:*Ricardo Ovaldia <ovaldia@yahoo.com>

- Prev by Date:
**st: Re: xtlogit** - Next by Date:
**st: RE: xtlogit** - Previous by thread:
**re: re: st: Re: xtlogit and logistic-cluster (REVISED)** - Next by thread:
**Re: st: Re: xtlogit and logistic-cluster (REVISED)** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |