[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Logistic Regression_Unequal Ns (outcomes)

From	jverkuilen <[email protected]>
To	<[email protected]>
Subject	RE: st: Logistic Regression_Unequal Ns (outcomes)
Date	Mon, 9 Mar 2009 00:59:29 -0400

I agree with Rich, the classification table is even more useless than it normally is here. That doesn't mean modeling is useless, though. If so, many epidemiological studies would be busted because the events are often quite rare. 

Performance of the model will need to be judged on different grounds. I would suggest using an ROC approach or considering odds ratios, both of which are pretty common in this kind of thing. Of course one problem with highly skewed margins is the fact that the effective n is primarily determined by the minimum frquency of events or non-events. Which is to say be prepared to be disappointed with large standard errors.   

If I recall correctly -relogit- is for case-control studies, which might be a more useful approach. Worth looking into. 

JV
-----Original Message-----
From: "Richard Williams" <[email protected]>
To: "[email protected]" <[email protected]>; "[email protected]" <[email protected]>
Sent: 3/8/2009 11:43 AM
Subject: Re: st: Logistic Regression_Unequal Ns (outcomes)

At 08:34 AM 3/8/2009, Chao Yawo wrote:
>Hello, I'm preparing to run a logit model predicting the odds of NOT
>testing for an STD.  As you can see from the table below, 4688 (about
>98%) of respondents have my outcome of interest (i.e., have not tested
>for an STD).  I realized that because of this unequal groupings, all
>crosstabulations have higher proportions within the untested category.
>  I have a feeling that these could bias my estimates in a way. For
>example, given the unequal groupings, I think I am only restricted to
>modeling failure to test (the zero outcome), as modeling for ever
>tested (1) could lead to unstable estimates.  So my question is  what
>possible impact will this have on my model, and what can I do about
>it?  Thanks - Chao

Like Martin says, it doesn't matter which is one and which is 
zero.  Also, my experience is that the classification table, which I 
never use all that much anyway,  is especially worthless when you 
have such an extreme split.

You may wish to check into Gary King's -relogit-.  See

http://gking.harvard.edu/stats.shtml

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Convergence time of logit with fixed effect
Next by Date: Re: st: Tabout
Previous by thread: RE: AW: st: Logistic Regression_Unequal Ns (outcomes)
Next by thread: Re: st: DiD with panel data
Index(es):
- Date
- Thread