Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: conditional logistic


From   Ricardo Ovaldia <ovaldia@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: conditional logistic
Date   Thu, 25 Oct 2007 07:08:02 -0700 (PDT)

Great! Thank you,

Ricardo

--- "David W. Harless" <dwharles@vcu.edu> wrote:

> Ricardo Ovaldia wrote:
> > Dear all,
> > 
> > I posted this under a different header and did not
> get
> > a reply. So let me ask the question better.
> > 
> > What is the difference between conditional
> logistic
> > regression grouping on clinic and unconditional
> > logistic regression including clinic as a dummy
> > (indicator) variable? Tha is, what is the
> difference
> > in model assumptions and parameter estimates?
> > 
> > Thank you,
> > Ricardo.
> 
> The most important difference is that logit/logistic
> regression with dummy variables for 
> groups is inconsistent unless the number of
> observations per group is large.  There is a 
> brief discussion of this (including cites) in the
> manual entry for -clogit- (page 224 of 
> Reference A-J for Release 9 Manual).
> 
> Way back when in February, 2000 Bill Gould and Vince
> Wiggins posted the note pasted below 
> which gives a good explanation of these issues.
> 
> Dave Harless
> 
> > Jen Ireland <Jen.Ireland@bristol.ac.uk> wrote, 
> > 
> >> > I am estimating a logit model in which I have
> clustered the observations
> >> > on the basis of a particular variable, not
> otherwise included in the
> >> > model, as I have reason to believe that the
> observations may not be
> >> > independent within the clusters.
> >> > 
> >> > A colleague has argued that I could do just as
> well by simply including
> >> > the clustering variable as an explanatory
> variable in my model.  Why is
> >> > it better to use clustering?
> > 
> > Unless there is something very odd about Jen's
> problem about which he is not
> > telling us, I assume Jen's colleague is suggesting
> not that Jen simply include
> > the cluster variable as a single variable in his
> model, but that Jen include a
> > set of dummies for each value of the cluster
> variable.
> > 
> > Assume I have data grouped into clusters and I
> label the clusters 1, 2, 3, 
> > and so on.  If I included the cluster variable as
> a single variable, I would 
> > obtain a single coefficient for the cluster
> variable -- call it b -- and I 
> > would be saying that the effect of being in the
> first cluster is b, the effect
> > of being in the second cluster is 2*b, and so on.
> > 
> > But my labeling of the groups as cluster 1, 2, 3,
> is arbitrary, I assume.  I
> > could just as well order the clusters, putting
> what is now cluster 3 into the
> > first postiion, cluster 1 in the second, and so
> on.  Then I could call those
> > clusters 1, 2, 3 ..., and therein lies a problem.
> > 
> > So I assume that the suggestion was to include a
> dummy variable for the 
> > first cluster, another dummy variable for the
> second, and so on.
> > 
> > Given that interpretation, and with respect, I
> must disagree with Jen's
> > colleague.  To make a long story short (which long
> story I am about to tell),
> > Jen's colleague perhaps wished to suggest Jen use
> conditional logistic
> > regression (clogit) as an alterntive to -logit,
> cluster() robust-.  Had he
> > said that, I would, in some cases, have agreed.
> > 
> > 
> > The basis of Jen's collegue's comment
> > -------------------------------------
> > 
> > Rather than using the clustering correction to
> calculating the standard
> > errors, one could instead model the clustering. 
> If one does that, and if one
> > has the modeling (meaning the assumptions) right
> -- one should be able to
> > produce more efficient estimates than those
> produced by -robust cluster()-.
> > 
> > Within-cluster correlation can arise for any
> number of reasons, but one 
> > particular reason is that each cluster has its own
> intercept.  In that case,
> > one is tempted to estimate those intercepts by
> simply including the dummy
> > variables.
> > 
> > That approach works in the case of linear
> regression, but it does not work in
> > general.  Said technically, the asympotics are
> violated.  Call the number of
> > clusters n and the average number of observations
> within cluster T, so that
> > the total number of obsrvations is N=n*T.  As
> T->infinity, all is well.  As
> > n->infinity, however, both the number of estimated
> parameters (coefficients on
> > the dummy variables) and the number of
> observations are going to infinity
> > together and only in strange cases does it work
> out that any of the estimated
> > parameters approach their true values.
> > 
> > The strange case is linear regression and that
> occurs because it is linear
> > (although the reason is not transparent).  
> > 
> > In the case of logistic regression, however, the
> estimates one obtains from
> > including all the dummies are biased and, even as
> n->infinity, that bias never
> > goes away.  Vince Wiggins <vwiggins@stata.com> and
> I recently simulated this
> > and discovered that this not a sterile,
> theoretical argument -- the estimates
> > on obtains for the parameters are genuinely bad.
> > 
> > To obtain good estimates, one must develop a new
> estimator.  Models with
> > separate intercepts per cluster are known as
> "fixed-effects models".  In the
> > case of logistic regression, this fixed-effects
> estimator is conditional
> > logistic regression.
> > 
> > Thus, conditional logistic regression -- Stata's
> -clogit- command -- is an
> > alternative to using -robust cluster()-.  In the
> case where the correlation
> > arises because of fixed effects (different
> intercepts across groups), -clogit-
> > is better is than -robust cluster()- because it
> produces more efficient
> > estimates, meaning more accurate estimates with
> smaller standard errors
> > and it is even better than that because there is
> now more going on in this
> > model than just correlation within cluster
> (namely, the possibility of 
> > correlation of the fixed effects with other
> covariates) and -clogit- is 
> > taking that into account, too.
> > 
> > However, correlation within group can arise for a
> lot of reasons.  Perhaps 
> > the observations within groups are serially
> correlated, or perhaps two of the
> > observations are whoppingly correlated and, after
> that, there is not much
> > correlation at all, or perhaps the correlation
> structure differs across the
> > clusters.  In that case, -clogit- will not produce
> correct standard errors.
> > 
> > Meanwhile, -robust cluster()- will continue to
> produce correct standard errors
> > for it's inefficient but population-wise
> consistent estimates.
> > 
> > -- Bill               -- Vince
> > wgould@stata.com         vwiggins@stata.com
> > 
> *
> *   For searches and help try:
> *  
> http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


Ricardo Ovaldia, MS
Statistician 
Oklahoma City, OK

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index