# st: AW: Logistic Regression_Unequal Ns (outcomes)

 From "Martin Weiss" To Subject st: AW: Logistic Regression_Unequal Ns (outcomes) Date Sun, 8 Mar 2009 14:48:00 +0100

```<>

The problem with such an unequal distribution of the dependent variable is
that you would have a hard time beating the naive model, i.e. saying that no
one ever tests for STD, w/o reference to any covariates. That would classify

BTW, whether you model "test" or "no test" simply inverts the coefficients
in the -logit-

*************
clear*
inp std resid freq
0 1 419
0 2 4269
1 1 46
1 2 30
end
logit std resid [fweight = freq]
recode std (1=0) (0=.)
recode std (.=1)
logit std resid [fweight = freq]
*************

HTH
Martin

-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Chao Yawo
Gesendet: Sonntag, 8. März 2009 14:34
An: statalist@hsphsun2.harvard.edu
Betreff: st: Logistic Regression_Unequal Ns (outcomes)

Hello, I'm preparing to run a logit model predicting the odds of NOT
testing for an STD.  As you can see from the table below, 4688 (about
98%) of respondents have my outcome of interest (i.e., have not tested
for an STD).  I realized that because of this unequal groupings, all
crosstabulations have higher proportions within the untested category.
I have a feeling that these could bias my estimates in a way. For
example, given the unequal groupings, I think I am only restricted to
modeling failure to test (the zero outcome), as modeling for ever
tested (1) could lead to unstable estimates.  So my question is  what
possible impact will this have on my model, and what can I do about
it?  Thanks - Chao

(Ever     |
been     |             Type of place of
tested   |                  residence
for STD |        1              2           Total
----------+------------------------------------------------
0  |      7.973       92.03         100
|       419         4269        4688
|
1  |      62.5          37.5         100
|       46             30            76
| -------------------------------------------------
Total |    8.806         91.19        100
|       465           4299      4764
-------------------------------
Key:  row percentages
number of observations

------------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```