[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: conditional logistic |

Date |
Thu, 25 Oct 2007 09:55:55 -0500 |

Ricardo Ovaldia <ovaldia@yahoo.com> asks, > What is the difference between conditional logistic > regression grouping on clinic and unconditional > logistic regression including clinic as a dummy > (indicator) variable? That is, what is the difference > in model assumptions and parameter estimates? The difference is that the logistic regression estimates are inconsistent and bad. Let's deal with inconsistent first. Think of what happens as the number of observations goes to infinity. Let's denote the number of clinics as n and, just to make things easy, let's assume the number of observations within clinic is the same for each clinic, and is m. Then the total number of observations is N = n*m. What happens as N->ininity? Presumably, the number of clinics increases. In this thought experiment, you are presumably imagining a replication of the world as we observe it, with clinics serving roughly the same number of patients, so as number of patients grows, so do the number of clinics. Said in our notation, we are imagining n going to infinity and m remaining constant. In standard logistic regression, that means we are estimating n-1 coefficients for the clinics. The number of coefficients is incrasing at the same rate as the number of observations, with the result that there is no convergence to all the usual statistical properties you are used to estimators having. This may sound arcane, but it isn't, as you can show via simulation. Even easier, however, is to think about a simpler problem. Consider standard logistic regression with a standard problem -- no clinics, nothing odd. We'll assume one RHS variable, say sex. It will not surprise you to hear that with just 4 observations, the estimates produced by the standard logistic regression estimator are bad. The estimates would turn good if we added more observations, but it turns out that with just 4, the asymptotics have not yet kicked in and the estimates produced by the standard logistic regression estimator are bad, not merely poor. By poor, I mean noisy. By bad, I mean biased, wrong, and having no good properties. Now let's consider the clinic. Let's pretend we have 1,000 clinics and 4 observations per clinic. What running . xi: logistic outcome sex i.clinic amounts to as running separate logistic regressions for each clinic, but with the constraint the the coefficient on sex is the same across them. I just told you that with 4 observations, standard logistic is bad. Combining 1,000 bad results does not improve them; they are still bad. If the results were merely poor -- noisy -- then combining them would help, but that's not our case. On the other hand, if by N = n*m -> infinity we held n constant and let m->infinity, we would get good results. By m going to infinity, you will have a world in which the number of clinics remains fixed but the number of observations within clinic increases. Under those circumstances, each logistic regression would turn good once m got large enough, and combining the results will make them even better. So does it matter which thought experiment is in your mind? No. Whether you imagine n->infinity or m->infinity, if you have m=4, you have insufficient observations for the standard logistic gression estimator, and results will be bad. If you have m=20, then in most circumstances you do have sufficient observations for the logistic estimator to work. But if you were to get more data and the first thought experiment is the correct one, meaning the number of clinics increase, the estimates will not get better, and that should distrurb you. More data usually means better estimates. Due to mathematical trickery, the conditional logistic estimator does not estimate the individual coefficients for each clinic and so avoids the problem of the number of estimates increasing at the same rate as the number of observations goes to infinity regardless of the decomposition of the increase. I told you that, with just 4 observations, standard logistic regression is bad. So would be the conditional logistic regression with just one clinic. But unlike the standard logisitic estimator, if you hold the size of clinics constant and increase the number of them, results get better and better. Give me a dataset with 20 clinics, and in most cases, I'm in asymptopia. Results are trustworth and, given more data, they just get better and better. -- Bill wgould@stata.com P.S. Let me add a footnote to the argument above. The footnote is unimportant for the argument made, but is important in linear regression problems. The gist of the problem in the standard logistic regression estimator is that the number of estimated parameterse increases as the same rate as the number of observations. The same could be said of the linear regression estimator and yet there is no problem because of it. Why? Because in the LR estimator, the problem of estimating the clinic intercepts can be separated from the problem of estimating the sex coefficient. It just turns out that way because of the linear nature of the linear-regression estimator. The same is not true of logistic. The logic, "if the number of estimates increases at the same rate as number of observations, there will be problems" is generally true, the exception being cases where there is a particular kind of separability, which happens only in the linear case. <end> * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: conditional logistic***From:*Garry Anderson <g.anderson@unimelb.edu.au>

**st: Strange behavior from conditional logistic - bug?***From:*Ricardo Ovaldia <ovaldia@yahoo.com>

**Re: st: conditional logistic***From:*Ricardo Ovaldia <ovaldia@yahoo.com>

- Prev by Date:
**st: problem with .png graph export in Stata 10** - Next by Date:
**Re: st: problem with .png graph export in Stata 10** - Previous by thread:
**st: problem with .png graph export in Stata 10** - Next by thread:
**Re: st: conditional logistic** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |