[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Ricardo Ovaldia <ovaldia@yahoo.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: conditional logistic |

Date |
Thu, 25 Oct 2007 07:08:02 -0700 (PDT) |

Great! Thank you, Ricardo --- "David W. Harless" <dwharles@vcu.edu> wrote: > Ricardo Ovaldia wrote: > > Dear all, > > > > I posted this under a different header and did not > get > > a reply. So let me ask the question better. > > > > What is the difference between conditional > logistic > > regression grouping on clinic and unconditional > > logistic regression including clinic as a dummy > > (indicator) variable? Tha is, what is the > difference > > in model assumptions and parameter estimates? > > > > Thank you, > > Ricardo. > > The most important difference is that logit/logistic > regression with dummy variables for > groups is inconsistent unless the number of > observations per group is large. There is a > brief discussion of this (including cites) in the > manual entry for -clogit- (page 224 of > Reference A-J for Release 9 Manual). > > Way back when in February, 2000 Bill Gould and Vince > Wiggins posted the note pasted below > which gives a good explanation of these issues. > > Dave Harless > > > Jen Ireland <Jen.Ireland@bristol.ac.uk> wrote, > > > >> > I am estimating a logit model in which I have > clustered the observations > >> > on the basis of a particular variable, not > otherwise included in the > >> > model, as I have reason to believe that the > observations may not be > >> > independent within the clusters. > >> > > >> > A colleague has argued that I could do just as > well by simply including > >> > the clustering variable as an explanatory > variable in my model. Why is > >> > it better to use clustering? > > > > Unless there is something very odd about Jen's > problem about which he is not > > telling us, I assume Jen's colleague is suggesting > not that Jen simply include > > the cluster variable as a single variable in his > model, but that Jen include a > > set of dummies for each value of the cluster > variable. > > > > Assume I have data grouped into clusters and I > label the clusters 1, 2, 3, > > and so on. If I included the cluster variable as > a single variable, I would > > obtain a single coefficient for the cluster > variable -- call it b -- and I > > would be saying that the effect of being in the > first cluster is b, the effect > > of being in the second cluster is 2*b, and so on. > > > > But my labeling of the groups as cluster 1, 2, 3, > is arbitrary, I assume. I > > could just as well order the clusters, putting > what is now cluster 3 into the > > first postiion, cluster 1 in the second, and so > on. Then I could call those > > clusters 1, 2, 3 ..., and therein lies a problem. > > > > So I assume that the suggestion was to include a > dummy variable for the > > first cluster, another dummy variable for the > second, and so on. > > > > Given that interpretation, and with respect, I > must disagree with Jen's > > colleague. To make a long story short (which long > story I am about to tell), > > Jen's colleague perhaps wished to suggest Jen use > conditional logistic > > regression (clogit) as an alterntive to -logit, > cluster() robust-. Had he > > said that, I would, in some cases, have agreed. > > > > > > The basis of Jen's collegue's comment > > ------------------------------------- > > > > Rather than using the clustering correction to > calculating the standard > > errors, one could instead model the clustering. > If one does that, and if one > > has the modeling (meaning the assumptions) right > -- one should be able to > > produce more efficient estimates than those > produced by -robust cluster()-. > > > > Within-cluster correlation can arise for any > number of reasons, but one > > particular reason is that each cluster has its own > intercept. In that case, > > one is tempted to estimate those intercepts by > simply including the dummy > > variables. > > > > That approach works in the case of linear > regression, but it does not work in > > general. Said technically, the asympotics are > violated. Call the number of > > clusters n and the average number of observations > within cluster T, so that > > the total number of obsrvations is N=n*T. As > T->infinity, all is well. As > > n->infinity, however, both the number of estimated > parameters (coefficients on > > the dummy variables) and the number of > observations are going to infinity > > together and only in strange cases does it work > out that any of the estimated > > parameters approach their true values. > > > > The strange case is linear regression and that > occurs because it is linear > > (although the reason is not transparent). > > > > In the case of logistic regression, however, the > estimates one obtains from > > including all the dummies are biased and, even as > n->infinity, that bias never > > goes away. Vince Wiggins <vwiggins@stata.com> and > I recently simulated this > > and discovered that this not a sterile, > theoretical argument -- the estimates > > on obtains for the parameters are genuinely bad. > > > > To obtain good estimates, one must develop a new > estimator. Models with > > separate intercepts per cluster are known as > "fixed-effects models". In the > > case of logistic regression, this fixed-effects > estimator is conditional > > logistic regression. > > > > Thus, conditional logistic regression -- Stata's > -clogit- command -- is an > > alternative to using -robust cluster()-. In the > case where the correlation > > arises because of fixed effects (different > intercepts across groups), -clogit- > > is better is than -robust cluster()- because it > produces more efficient > > estimates, meaning more accurate estimates with > smaller standard errors > > and it is even better than that because there is > now more going on in this > > model than just correlation within cluster > (namely, the possibility of > > correlation of the fixed effects with other > covariates) and -clogit- is > > taking that into account, too. > > > > However, correlation within group can arise for a > lot of reasons. Perhaps > > the observations within groups are serially > correlated, or perhaps two of the > > observations are whoppingly correlated and, after > that, there is not much > > correlation at all, or perhaps the correlation > structure differs across the > > clusters. In that case, -clogit- will not produce > correct standard errors. > > > > Meanwhile, -robust cluster()- will continue to > produce correct standard errors > > for it's inefficient but population-wise > consistent estimates. > > > > -- Bill -- Vince > > wgould@stata.com vwiggins@stata.com > > > * > * For searches and help try: > * > http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > Ricardo Ovaldia, MS Statistician Oklahoma City, OK __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: conditional logistic***From:*"David W. Harless" <dwharles@vcu.edu>

- Prev by Date:
**Re: st: reverse prediction - confidence interval for x at given y in nonlinear model** - Next by Date:
**Re: st: estout and likelihood ratio test** - Previous by thread:
**Re: st: conditional logistic** - Next by thread:
**Re: Re: st: CI for adjusted mean** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |