Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: interpretting the estat gof commands and Hosmer Lemeshow version of it

From   Doug Hess <[email protected]>
To   [email protected]
Subject   Re: st: interpretting the estat gof commands and Hosmer Lemeshow version of it
Date   Tue, 20 Sep 2011 10:10:24 -0400

Thank you! Very clear and very helpful. The distinction between
discrimination and calibration is tricky to wrap one's head around
when starting out with these. (Still not sure I fully get it!)

Thanks again.


Date: Mon, 19 Sep 2011 22:05:41 +0000
From: Clyde B Schechter <[email protected]>
Subject: Re: st: interpretting the estat gof commands and Hosmer
Lemeshow version of it

Doug Hess posted a question about the output of -estat gof- with and
without the group option.

. estat gof
number of observations = 143585
number of covariate patterns = 108638
Pearson chi2(108575) = 106784.16
Prob > chi2 = 0.9999

So, nearly all of the covariate patterns will be singletons, a few
might be pairs or triplets or quartets, I doubt even 1% of the
patterns will be instantiated even four times.  Your "cells" are
simply too small for the chi square test to work, even though you have
scads of them.  (More formally, the test statistic has an approximate
chi square distribution, but the approximation is poor if there are
substantial numbers of cells with small expected values.)

. estat gof, g(10) table
number of observations = 143585
number of groups = 10
Hosmer-Lemeshow chi2(8) = 322.31

Prob > chi2 = 0.0000Decile Pred Prob Obs y=1 Exp y=1 Total Diff % diff
1 0.019 115 190 14,359 75 65%
2 0.025 194 315 14,358 121 62%
3 0.034 305 419 14,359 114 37%
4 0.044 443 560 14,359 117 26%
5 0.055 671 704 14,361 33 5%
6 0.072 864 904 14,355 40 5%
7 0.100 1,379 1,213 14,359 166 12%
8 0.163 2,122 1,827 14,358 295 14%
9 0.302 3,615 3,207 14,359 408 11%
10 0.856 6,175 6,543 14,358 368 6%
Sum= 15,883 15,883 143,585

Here your problem is that your sample is huge.  With 143,585
observations distributed over 10 cells, you are running a test with
power to detect even minusucule departures from perfect fit.  I think
you are better off ignoring the p-value and just looking at the fit as
indicated in the %diff column of your output.  You are preetty good in
the middle, and not bad at the upper end of the predicted risk
distribution.  But you are seriously underpredicting at the lower

So this is not a particularly well calibrated model.  If good
calibration is really important for your purposes, you might try to
fix it by identifying whether some of your linear terms would be
better represented by low degree polynomials or splines, or by
including some interaction effects (or taking some out!)  (Graphical
exploration would be a good guide.)  Or, perhaps there are some
important missing predictors.

Of course, you have very good discrimination at AROC = 0.83, and if
your purposes are more about understanding factors that influence the
outcome, calibration may not be so important.  On the other hand, if
you intend to use the predicted probabilities themselves, then you
need to improve your model.

Hope this is helpful.

Clyde Schechter
Department of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index