[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: conditional logistic |

Date |
Fri, 26 Oct 2007 11:00:35 -0500 |

Concerning my posting on logistic regression with dummies vs. conditional logistic regression yesterday, Ricardo Ovaldia <ovaldia@yahoo.com> first quotes me, WG> The difference is that the logistic regression WG> estimates are inconsistent WG> and bad. and then writes, RO> I must have missed something. From Bill's argument I RO> see how this is true if you have few observation RO> within clinic (for example 4), but I am not sure if RO> the same holds if you have 4 clinics each with 200+ RO> observations which is the most ususal scenario. First, the theoretical statement: Standard logistic regression on four clinics each with 200+ observations will generate good results. One would usually expect to get even better results as the number of observations increases. In this case, however, if the number of observations increases by increasing the number of clinics while holding the number of observations within clinics constant, I cannot prove that results will get better. As a practical matter, Ricardo might be right. I just did a simulation where I assumed Case 1: lnodds(outcome_j) = 0 + 1*sex_j Case 2: lnodds(outcome_ij) = a_i + 1*sex_ij 200 observations in each clinic i. I did these simulations for overall sample sizes of 200, 400, 800, 1600, 3200, 6400, 12800, 25600, and 25600. Here are the results I got: Case 1 Case 2 _b[sex] _se[sex] _b[sex] _se[sex] ------------------------------------------------- 800 1.1090 .15 1.0474 .17 1600 .8713 .10 1.0494 .11 3200 .9475 .07 1.0474 .08 6400 .9898 .05 1.0803 .06 12800 .9886 .04 1.0849 .04 25600 1.0027 .03 1.0315 .03 ------------------------------------------------- Theoretically, I know that Case 1 converges. Theoretically, I cannot prove that Case 2 will converge. In this example, however, note that the standard errors on the sex coefficient are both heading down and are, in fact, roughly comparable. In this case, the constant clinic size model seems to be converging. If it is convering, I have a suspicion as to why: The model I simulated I could have fitted using a linear probablity model and that model would have fit the data about as well. All the RHS variables are dummies and even from linear regression, no prediction would have been below 0 or above 1. I also know that the linear regression model does converge in both cases 1 and 2. Thus, it would not surprise me that the logistic regression estimator works fine in this case. On the other hand, my simulation is inadequate because I have not verified that the coverage is right in case 2: I report the standard error for one regression in each row, and that reported standard error seems to be convering. I should do a Monte Carlo on each row and check that the reported standard error has the correct coverage. I could do all that. I should do all that. But I must ask: Asuming one is not auditing individual clinics, why use -logit- or -logistic- in this case when -clogit- is just as easy to use, is less computationally intensive, and theoretically known to be right? If one is auditing clinics, then we need to change the focus and start looking at how well -logit- and -logistic- estimate the individual clinic effects. -- Bill wgould@stata.com P.S. For anyone interested in pursuing this, here are the two programs I used to produce the table above. Do-file sim1.do makes columns 1 and 2 ov the table above: ----------------------------------------------------- sim1.do ---------------- clear all program onecl args N quietly { drop _all set obs `N' gen sex = cond(_n<=_N/2, 0, 1) gen lnodds = 1*sex gen p = exp(lnodds)/(1+exp(lnodds)) gen int outcome = (uniform()<=p) logit outcome sex } di `N' " " _b[sex] " " _se[sex] end set seed 1939998 onecl 200 onecl 400 onecl 800 onecl 1600 onecl 3200 onecl 6400 onecl 12800 onecl 25600 onecl 51200 ----------------------------------------------------- sim1.do ---------------- Do-file sim2.do makes columns 3 and 4: ----------------------------------------------------- sim2.do ---------------- clear all program twocl args N local m = 200 local n = `N'/`m' quietly { drop _all set obs `n' gen clinic = _n gen u = 2.5*(uniform()-.5) expand `m' sort clinic gen sex = cond(mod(_n,2)==0,0,1) gen lnodds = 1*sex + u gen p = exp(lnodds)/(1+exp(lnodds)) gen int outcome = (uniform()<=p) xi: logit outcome sex i.clinic } di `N' " " _b[sex] " " _se[sex] end set seed 1939998 twocl 800 twocl 1600 twocl 3200 twocl 6400 twocl 12800 twocl 25600 twocl 51200 twocl 102400 ----------------------------------------------------- sim2.do ---------------- <end> * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: conditional logistic***From:*Ricardo Ovaldia <ovaldia@yahoo.com>

- Prev by Date:
**RE: st: How to generate a cumulative sum (capital stock)** - Next by Date:
**st: random effects and test of correlation of residuals** - Previous by thread:
**RE: st: conditional logistic** - Next by thread:
**Re: st: conditional logistic** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |