[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
jverkuilen <jverkuilen@gc.cuny.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Logistic Regression_Unequal Ns (outcomes) |

Date |
Mon, 9 Mar 2009 00:59:29 -0400 |

I agree with Rich, the classification table is even more useless than it normally is here. That doesn't mean modeling is useless, though. If so, many epidemiological studies would be busted because the events are often quite rare. Performance of the model will need to be judged on different grounds. I would suggest using an ROC approach or considering odds ratios, both of which are pretty common in this kind of thing. Of course one problem with highly skewed margins is the fact that the effective n is primarily determined by the minimum frquency of events or non-events. Which is to say be prepared to be disappointed with large standard errors. If I recall correctly -relogit- is for case-control studies, which might be a more useful approach. Worth looking into. JV -----Original Message----- From: "Richard Williams" <Richard.A.Williams.5@ND.edu> To: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>; "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> Sent: 3/8/2009 11:43 AM Subject: Re: st: Logistic Regression_Unequal Ns (outcomes) At 08:34 AM 3/8/2009, Chao Yawo wrote: >Hello, I'm preparing to run a logit model predicting the odds of NOT >testing for an STD. As you can see from the table below, 4688 (about >98%) of respondents have my outcome of interest (i.e., have not tested >for an STD). I realized that because of this unequal groupings, all >crosstabulations have higher proportions within the untested category. > I have a feeling that these could bias my estimates in a way. For >example, given the unequal groupings, I think I am only restricted to >modeling failure to test (the zero outcome), as modeling for ever >tested (1) could lead to unstable estimates. So my question is what >possible impact will this have on my model, and what can I do about >it? Thanks - Chao Like Martin says, it doesn't matter which is one and which is zero. Also, my experience is that the classification table, which I never use all that much anyway, is especially worthless when you have such an extreme split. You may wish to check into Gary King's -relogit-. See http://gking.harvard.edu/stats.shtml ------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Convergence time of logit with fixed effect** - Next by Date:
**Re: st: Tabout** - Previous by thread:
**RE: AW: st: Logistic Regression_Unequal Ns (outcomes)** - Next by thread:
**Re: st: DiD with panel data** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |