[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: AW: Logistic Regression_Unequal Ns (outcomes)

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	st: AW: Logistic Regression_Unequal Ns (outcomes)
Date	Sun, 8 Mar 2009 14:48:00 +0100

<> 

The problem with such an unequal distribution of the dependent variable is
that you would have a hard time beating the naive model, i.e. saying that no
one ever tests for STD, w/o reference to any covariates. That would classify
98.4% of your population correctly...

BTW, whether you model "test" or "no test" simply inverts the coefficients
in the -logit-

*************
clear*
inp std resid freq
0 1 419
0 2 4269
1 1 46
1 2 30
end
logit std resid [fweight = freq]
recode std (1=0) (0=.)
recode std (.=1)
logit std resid [fweight = freq]
*************



HTH
Martin

-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Chao Yawo
Gesendet: Sonntag, 8. März 2009 14:34
An: [email protected]
Betreff: st: Logistic Regression_Unequal Ns (outcomes)

Hello, I'm preparing to run a logit model predicting the odds of NOT
testing for an STD.  As you can see from the table below, 4688 (about
98%) of respondents have my outcome of interest (i.e., have not tested
for an STD).  I realized that because of this unequal groupings, all
crosstabulations have higher proportions within the untested category.
 I have a feeling that these could bias my estimates in a way. For
example, given the unequal groupings, I think I am only restricted to
modeling failure to test (the zero outcome), as modeling for ever
tested (1) could lead to unstable estimates.  So my question is  what
possible impact will this have on my model, and what can I do about
it?  Thanks - Chao


(Ever     |
been     |             Type of place of
tested   |                  residence
for STD |        1              2           Total
----------+------------------------------------------------
        0  |      7.973       92.03         100
            |       419         4269        4688
            |
        1  |      62.5          37.5         100
            |       46             30            76
            | -------------------------------------------------
    Total |    8.806         91.19        100
          |       465           4299      4764
-------------------------------
  Key:  row percentages
        number of observations


------------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Logistic Regression_Unequal Ns (outcomes)
  - From: Chao Yawo <[email protected]>

Prev by Date: Re: st: DiD with panel data
Next by Date: st: Clustered standard errors in -xtreg- with dfadj
Previous by thread: st: Logistic Regression_Unequal Ns (outcomes)
Next by thread: Re: st: Logistic Regression_Unequal Ns (outcomes)
Index(es):
- Date
- Thread