Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: problem with predicted probabilities


From   Richard Goldstein <richgold@ix.netcom.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: problem with predicted probabilities
Date   Tue, 04 Feb 2014 08:17:22 -0500

My view of the classification table is slightly different; I certainly
agree that automatically using a cutoff of .5 is not always a good idea;
in particular, if the prevalence of the event in the data is very
different from .5 (e.g., .12 or .88 or ...), it is a bad idea; as an
alternative, use the cutoff option and start by using the prevalence as
a cutoff; substantive experts in your area may suggest other reasonable
cutoffs

Rich

On 2/4/14, 12:04 AM, Witness Chirinda wrote:
> Thanks Nick and Richard for you help!
> 
> On Sun, Feb 2, 2014 at 11:23 PM, Richard Williams
> <richardwilliams.ndu@gmail.com> wrote:
>> Getting every case classified as 0 (or 1) is not unusual. For relatively
>> rare events, the highest predicted probability for every case may be less
>> than .5, so every case gets classified as 0. My own experience is that the
>> classification table tends not to be that helpful, especially for events
>> that are very rare or very common.
>>
>>
>> At 04:58 AM 2/2/2014, Witness Chirinda wrote:
>>>
>>> Dear Statalist
>>> I want to obtain some predicted probabilities after logistic
>>> regression, as attached. I want to use the predicted probabilities in
>>> my next step instead of observed prevalence since the latter are
>>> adjusted for other (socio-demographic) factors.
>>> My problem is that the when I run - estat classification- it giving 0s
>>> for + classification. I am sure I am doing it the wrong way somewhere.
>>> Please see below output. All variables used in the model have been
>>> recorded to be binary 1/0
>>>
>>> Thanks for any help!
>>> ------------------
>>>
>>>
>>> . logistic Health_stat  age maried wealth educat  place sex
>>>
>>> Logistic regression                               Number of obs   =
>>> 2339
>>>                                                            LR chi2(6)
>>>     =      50.61
>>>                                                            Prob > chi2
>>>     =     0.0000
>>> Log likelihood = -996.02516                       Pseudo R2       =
>>> 0.0248
>>>
>>> ------------------------------------------------------------------------------
>>>     Health_stat | Odds Ratio   Std. Err.      z    P>|z|     [95%
>>> Conf. Interval]
>>>
>>> -------------+----------------------------------------------------------------
>>>     age  | 1.109083   .0342696     3.35   0.001     1.043909    1.178326
>>> maried |  1.2134   .1962535     1.20   0.232     .8837556    1.666004
>>> Wealth |  1.430957   .1784661     2.87   0.004     1.120641    1.827203
>>> educat |  1.670411   .2010455     4.26   0.000     1.319397     2.11481
>>> place   |  .9334522   .1223134    -0.53   0.599     .7220318    1.206779
>>> sex     |   1.129008   .1324642     1.03   0.301     .8970722    1.420911
>>>
>>> . estat class
>>>
>>> Logistic model for poorSRHS
>>>               -------- True --------
>>> Classified |         D            ~D  |      Total
>>> -----------+--------------------------+-----------
>>>      +     |         0                 0  |          0
>>>       -     |       370          1969  |       2339
>>> -----------+--------------------------+-----------
>>>    Total   |       370          1969  |       2339
>>>
>>> Classified + if predicted Pr(D) >= .5
>>> True D defined as poorSRHS != 0
>>> --------------------------------------------------
>>> Sensitivity                     Pr( +| D)    0.00%
>>> Specificity                     Pr( -|~D)  100.00%
>>> Positive predictive value       Pr( D| +)       .%
>>> Negative predictive value       Pr(~D| -)   84.18%
>>> --------------------------------------------------
>>> False + rate for true ~D        Pr( +|~D)    0.00%
>>> False - rate for true D         Pr( -| D)  100.00%
>>> False + rate for classified +   Pr(~D| +)       .%
>>> False - rate for classified -   Pr( D| -)   15.82%
>>> --------------------------------------------------
>>> Correctly classified                        84.18%
>>> --------------------------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index