Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Hosmer Lemeshow table incorrectly reported probabilities?

From	"Visintainer, Paul" <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	st: RE: Hosmer Lemeshow table incorrectly reported probabilities?
Date	Fri, 26 Aug 2011 10:21:22 -0400

<>

Matthew, you can replicate the H-L cutpoints to see how it's generating the categories and the expected values:

   .sysuse auto

Run a logistic and generate the predicted probabilities:

   .qui logistic foreign price trunk mpg
   .predict phat

Create 5 groups (or 10 for the H-L test) of approximately equal size:

   .xtile group=phat, nq(5)
         
Find the maximum predicted probability for each group (the cutpoint for each group) (I rounded this to match the H-L table output):

   .egen cutpoint=max(round(phat,.0001)), by(group)

Generate the expected counts within groups (I rounded this to match the H-L table):

   .egen x=total(phat), by(group)
   .gen Exp_1=round(x,.1)
         

Then, just tabulate your output and compare it to the H-L table:

   .estat gof, group(5) table
   .tab cutpoint    // <-- "cutpoint" is given as "Prob" in the H-L table
   .tab Exp_1

Exp_0 is the difference between the group total and Exp_1.

The whole thing looks like this:

   .sort phat
   .bys group: list phat foreign

The H-L test is simply creating approximately equal groups of size n, then summing the predicted probabilities within each group to find the expected probability.   The observed counts are compared to the expected counts.

Paul

________________________________________________
Paul F. Visintainer, PhD
Baystate Medical Center
Springfield, MA 01199

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Matthew Baldwin, MD
Sent: Thursday, August 25, 2011 5:41 PM
To: [email protected]
Subject: st: Hosmer Lemeshow table incorrectly reported probabilities?

Dear Statlisters,

I created a logistic regression model that predicts mortality.  I am  
examining the model's calibration, and the predicted probabilities for  
quintiles of risk seem to be incorrectly calculated using STATA.

Using the code "gof, group(5), table" gives me the 5 groups used for  
the Hosmer-Lemeshow Chi squared test used to test model calibration  
and gives predicted probabilities (in this case, probability of  
death), observed and expected counts for both outcomes (deaths in this  
case), and totals for each group.

However, the expected probabilities that the "estat gof, group(5)  
table" code in STATA gives are different from the expected  
probabilities I calculate from the expected counts in the table  
itself. I am confused about this:

  Logistic model for death2, goodness-of-fit test

   (Table collapsed on quantiles of estimated probabilities)
   +--------------------------------------------------------+
   | Group |   Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
   |-------+--------+-------+-------+-------+-------+-------|
   |     1 | 0.0958 |    15 |  20.1 |   296 | 290.9 |   311 |
   |     2 | 0.1757 |    47 |  41.7 |   264 | 269.3 |   311 |
   |     3 | 0.3006 |    66 |  73.3 |   245 | 237.7 |   311 |
   |     4 | 0.4494 |   119 | 114.8 |   192 | 196.2 |   311 |
   |     5 | 0.9328 |   183 | 180.2 |   127 | 129.8 |   310 |
   +--------------------------------------------------------+

        number of observations =      1554
              number of groups =         5
       Hosmer-Lemeshow chi2(3) =         3.46
                   Prob > chi2 =         0.3265


Here are my calculations for expected probabilities based on the table:

x (expected mortality)     y (observed mortality)
20.1/311 = 0.065           15/311 = 0.048
41.7/311 = 0.13            47/311 = 0.15
73.3/311 = 0.24            66/311 = 0.21
114.8/311 = 0.37           119/311 = 0.38
180.2/310 = 0.58           183/310 = 0.58


Can anyone explain how the expected probabilities in the table were derived?

Is the "gof, group(x) table" function in STATA reporting incorrect  
expected probabilities?

Thanks,

Matthew Baldwin, MD
Department of Pulmonary, Allergy, and Critical Care Medicine
New York Presbyterian Hospital
Columbia University College of Physicians and Surgeons


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

----------------------------------------------------------------------
Please view our annual report at http://baystatehealth.org/annualreport
 

CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please reply to the sender immediately or by telephone at 413-794-0000 and destroy all copies of this communication and any attachments. For further information regarding Baystate Health's privacy policy, please visit our Internet site at http://baystatehealth.org.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Hosmer Lemeshow table incorrectly reported probabilities?
  - From: "Matthew Baldwin, MD" <[email protected]>

Prev by Date: RE: st: Useful labelling of dummy variables following logit
Next by Date: st: SVY question
Previous by thread: Re: st: Hosmer Lemeshow table incorrectly reported probabilities?
Next by thread: st: dfuller: Why is Stata using the t-distribution when I use "drift"
Index(es):
- Date
- Thread