Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: interpretting the estat gof commands and Hosmer Lemeshow version of it

 From Doug Hess <[email protected]> To [email protected] Subject Re: st: interpretting the estat gof commands and Hosmer Lemeshow version of it Date Tue, 20 Sep 2011 10:10:24 -0400

```Thank you! Very clear and very helpful. The distinction between
discrimination and calibration is tricky to wrap one's head around
when starting out with these. (Still not sure I fully get it!)

Thanks again.

-Doug

Date: Mon, 19 Sep 2011 22:05:41 +0000
From: Clyde B Schechter <[email protected]>
Subject: Re: st: interpretting the estat gof commands and Hosmer
Lemeshow version of it

Doug Hess posted a question about the output of -estat gof- with and
without the group option.

. estat gof
number of observations = 143585
number of covariate patterns = 108638
Pearson chi2(108575) = 106784.16
Prob > chi2 = 0.9999

So, nearly all of the covariate patterns will be singletons, a few
might be pairs or triplets or quartets, I doubt even 1% of the
patterns will be instantiated even four times.  Your "cells" are
simply too small for the chi square test to work, even though you have
scads of them.  (More formally, the test statistic has an approximate
chi square distribution, but the approximation is poor if there are
substantial numbers of cells with small expected values.)

. estat gof, g(10) table
number of observations = 143585
number of groups = 10
Hosmer-Lemeshow chi2(8) = 322.31

Prob > chi2 = 0.0000Decile Pred Prob Obs y=1 Exp y=1 Total Diff % diff
1 0.019 115 190 14,359 75 65%
2 0.025 194 315 14,358 121 62%
3 0.034 305 419 14,359 114 37%
4 0.044 443 560 14,359 117 26%
5 0.055 671 704 14,361 33 5%
6 0.072 864 904 14,355 40 5%
7 0.100 1,379 1,213 14,359 166 12%
8 0.163 2,122 1,827 14,358 295 14%
9 0.302 3,615 3,207 14,359 408 11%
10 0.856 6,175 6,543 14,358 368 6%
Sum= 15,883 15,883 143,585

observations distributed over 10 cells, you are running a test with
power to detect even minusucule departures from perfect fit.  I think
you are better off ignoring the p-value and just looking at the fit as
indicated in the %diff column of your output.  You are preetty good in
the middle, and not bad at the upper end of the predicted risk
distribution.  But you are seriously underpredicting at the lower
range.

So this is not a particularly well calibrated model.  If good
calibration is really important for your purposes, you might try to
fix it by identifying whether some of your linear terms would be
better represented by low degree polynomials or splines, or by
including some interaction effects (or taking some out!)  (Graphical
exploration would be a good guide.)  Or, perhaps there are some
important missing predictors.

Of course, you have very good discrimination at AROC = 0.83, and if
outcome, calibration may not be so important.  On the other hand, if
you intend to use the predicted probabilities themselves, then you

Clyde Schechter
Department of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```