Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: interpretting the estat gof commands and Hosmer Lemeshow version of it

 From Clyde B Schechter <[email protected]> To "[email protected]" <[email protected]> Subject Re: st: interpretting the estat gof commands and Hosmer Lemeshow version of it Date Mon, 19 Sep 2011 22:05:41 +0000

```Doug Hess posted a question about the output of -estat gof- with and without the group option.

. estat gof
number of observations = 143585
number of covariate patterns = 108638
Pearson chi2(108575) = 106784.16
Prob > chi2 = 0.9999

So, nearly all of the covariate patterns will be singletons, a few might be pairs or triplets or quartets, I doubt even 1% of the patterns will be instantiated even four times.  Your "cells" are simply too small for the chi square test to work, even though you have scads of them.  (More formally, the test statistic has an approximate chi square distribution, but the approximation is poor if there are substantial numbers of cells with small expected values.)

. estat gof, g(10) table
number of observations = 143585
number of groups = 10
Hosmer-Lemeshow chi2(8) = 322.31

Prob > chi2 = 0.0000Decile Pred Prob Obs y=1 Exp y=1 Total Diff % diff
1 0.019 115 190 14,359 75 65%
2 0.025 194 315 14,358 121 62%
3 0.034 305 419 14,359 114 37%
4 0.044 443 560 14,359 117 26%
5 0.055 671 704 14,361 33 5%
6 0.072 864 904 14,355 40 5%
7 0.100 1,379 1,213 14,359 166 12%
8 0.163 2,122 1,827 14,358 295 14%
9 0.302 3,615 3,207 14,359 408 11%
10 0.856 6,175 6,543 14,358 368 6%
Sum= 15,883 15,883 143,585

Here your problem is that your sample is huge.  With 143,585 observations distributed over 10 cells, you are running a test with power to detect even minusucule departures from perfect fit.  I think you are better off ignoring the p-value and just looking at the fit as indicated in the %diff column of your output.  You are preetty good in the middle, and not bad at the upper end of the predicted risk distribution.  But you are seriously underpredicting at the lower range.

So this is not a particularly well calibrated model.  If good calibration is really important for your purposes, you might try to fix it by identifying whether some of your linear terms would be better represented by low degree polynomials or splines, or by including some interaction effects (or taking some out!)  (Graphical exploration would be a good guide.)  Or, perhaps there are some important missing predictors.

Of course, you have very good discrimination at AROC = 0.83, and if your purposes are more about understanding factors that influence the outcome, calibration may not be so important.  On the other hand, if you intend to use the predicted probabilities themselves, then you need to improve your model.

Clyde Schechter
Department of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```