Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Fitting probit - estat gof puzzling results
"Matthew Baldwin, MD" <[email protected]>
[email protected]
Re: st: Fitting probit - estat gof puzzling results
Sun, 28 Aug 2011 15:17:50 -0400
I'm a physician, not statistitian, but as I understand it you're
measuring model discrimination with lroc and model calibration with
estat gof. There's a trade-off, as one gets getter, the other gets
worse. Most statistics textbooks on making prediction and testing
prediction models will review this phenomenon. I remember reading
Clinical Prediction Models by Streyerberg a while ago and thought it
explained this quite well.
Quoting J Gonzalez <[email protected]>:
Dear Stata list members
I am trying to estimate a probit model to understand which variables
influence (and how they do it) the decision of an individual to
apply for a health prevention program.
I have a dataset (nearly 40 thousand obs) with information about
applicants and non applicants, containing variables with
individual's information on demographics, health status and health
related risk factors, as well as socioeconomic indicators
(education, employment and housing information). With this
information I am trying to fit a probit model to estimate the
individual's probability of applying for the program, given
variables like age, educ, health status indicators and so on
(theoretically, those variables might affect the decision to apply).
I am not an expert, so I checked the stata probit post estimation
examples in the base reference manual, and I found several commands
useful to test the goodness of fit of my model, and here's how it
estat clas, all
Correctly classified = 90.02%
Sensitivity = 93.94%
Specificity = 83.31%
So, it seems quite good classification power (though a little bit
better for the positive-outcome cases)
Then I looked at the prediction and it looks like this (mean quite similar).
predict p
sum p apply
Variable | Obs Mean Std. Dev. Min Max
p | 42450 .6306243 .3935977 .0002053 .9999337
apply | 42451 .6306094 .4826455 0 1
Then, using lroc
area under ROC curve = 0.9488
So, following Stata base reference manual, "The greater the
predictive power, the more
*bowed the curve, and hence the area beneath the curve is often used
as a measure of the predictive
*power. A model with no predictive power has area 0.5; a perfect
model has area 1", hence, I guess the model is quite good because
the area under the ROC curve in my model is pretty much closer to a
perfect model, than a model without predictive power.
HOWEVER, estat gof does not seem to tell the same story
Actually, it is the opposite story, because the null hypothesis is
soundly rejected,
indicating that the model does not fit the data (am I right?).
. estat gof
Probit model for apply, goodness-of-fit test
number of observations = 42450
number of covariate patterns = 42409
Pearson chi2(42245) = 58810.50
Prob > chi2 = 0.0000
. estat gof, group(10) table
Probit model for apply, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
| 1 | 0.0293 | 97 | 50.2 | 4148 | 4194.8 | 4245 |
| 2 | 0.0881 | 267 | 234.1 | 3978 | 4010.9 | 4245 |
| 3 | 0.2731 | 552 | 674.2 | 3693 | 3570.8 | 4245 |
| 4 | 0.7419 | 2120 | 2203.3 | 2125 | 2041.7 | 4245 |
| 5 | 0.8664 | 3445 | 3475.3 | 800 | 769.7 | 4245 |
| 6 | 0.9136 | 3806 | 3787.5 | 439 | 457.5 | 4245 |
| 7 | 0.9445 | 4004 | 3947.5 | 241 | 297.5 | 4245 |
| 8 | 0.9689 | 4092 | 4062.5 | 153 | 182.5 | 4245 |
| 9 | 0.9893 | 4170 | 4157.5 | 75 | 87.5 | 4245 |
| 10 | 1.0000 | 4217 | 4228.1 | 28 | 16.9 | 4245 |
number of observations = 42450
number of groups = 10
Hosmer-Lemeshow chi2(8) = 109.99
Prob > chi2 = 0.0000
Why it might happen something like this?, that classification and
predictive power after a probit model
looks quite good (actually very good I think), but the goodness of
fit test indicates that the model does not fit the data, at all?
I am really clueless here, so I would really appreciate any
suggestion on why it might happen, and most importantly, how should
I proceed on testing it and/or modelling.
Best regards,
* For searches and help try:
Matthew Baldwin, MD
Clinical Fellow
Department of Pulmonary, Allergy, and Critical Care Medicine
New York Presbyterian Hospital
Columbia University College of Physicians and Surgeons
P: 917-899-2187
C: 917-846-7560
* For searches and help try: