Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Fitting probit - estat gof puzzling results

From	"Matthew Baldwin, MD" <[email protected]>
To	[email protected]
Subject	Re: st: Fitting probit - estat gof puzzling results
Date	Sun, 28 Aug 2011 15:17:50 -0400

I'm a physician, not statistitian, but as I understand it you'remeasuring model discrimination with lroc and model calibration withestat gof. There's a trade-off, as one gets getter, the other getsworse. Most statistics textbooks on making prediction and testingprediction models will review this phenomenon. I remember readingClinical Prediction Models by Streyerberg a while ago and thought itexplained this quite well.


Matt


Quoting J Gonzalez <[email protected]>:

Dear Stata list members
I am trying to estimate a probit model to understand which variablesinfluence (and how they do it) the decision of an individual toapply for a health prevention program.
I have a dataset (nearly 40 thousand obs) with information aboutapplicants and non applicants, containing variables withindividual's information on demographics, health status and healthrelated risk factors, as well as socioeconomic indicators(education, employment and housing information). With thisinformation I am trying to fit a probit model to estimate theindividual's probability of applying for the program, givenvariables like age, educ, health status indicators and so on(theoretically, those variables might affect the decision to apply).
I am not an expert, so I checked the stata probit post estimationexamples in the base reference manual, and I found several commandsuseful to test the goodness of fit of my model, and here's how itlooks.
__________________________________________________________
estat clas, all

Correctly classified = 90.02%
Sensitivity = 93.94%
Specificity = 83.31%
So, it seems quite good classification power (though a little bitbetter for the positive-outcome cases)
__________________________________________________________
Then I looked at the prediction and it looks like this (mean quite similar).

predict p
sum p apply

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           p |     42450    .6306243    .3935977   .0002053   .9999337
    apply  |     42451    .6306094    .4826455          0          1


__________________________________________________________
Then, using lroc

area under ROC curve   =   0.9488
So, following Stata base reference manual, "The greater thepredictive power, the more*bowed the curve, and hence the area beneath the curve is often usedas a measure of the predictive*power. A model with no predictive power has area 0.5; a perfectmodel has area 1", hence, I guess the model is quite good becausethe area under the ROC curve in my model is pretty much closer to aperfect model, than a model without predictive power.
__________________________________________________________
HOWEVER, estat gof does not seem to tell the same story
Actually, it is the opposite story, because the null hypothesis issoundly rejected,
indicating that the model does not fit the data (am I right?).

. estat gof

Probit model for apply, goodness-of-fit test

       number of observations =     42450
 number of covariate patterns =     42409
          Pearson chi2(42245) =     58810.50
                  Prob > chi2 =         0.0000

. estat gof, group(10) table

Probit model for apply, goodness-of-fit test

  (Table collapsed on quantiles of estimated probabilities)
  +----------------------------------------------------------+
  | Group |   Prob | Obs_1 |  Exp_1 | Obs_0 |  Exp_0 | Total |
  |-------+--------+-------+--------+-------+--------+-------|
  |     1 | 0.0293 |    97 |   50.2 |  4148 | 4194.8 |  4245 |
  |     2 | 0.0881 |   267 |  234.1 |  3978 | 4010.9 |  4245 |
  |     3 | 0.2731 |   552 |  674.2 |  3693 | 3570.8 |  4245 |
  |     4 | 0.7419 |  2120 | 2203.3 |  2125 | 2041.7 |  4245 |
  |     5 | 0.8664 |  3445 | 3475.3 |   800 |  769.7 |  4245 |
  |-------+--------+-------+--------+-------+--------+-------|
  |     6 | 0.9136 |  3806 | 3787.5 |   439 |  457.5 |  4245 |
  |     7 | 0.9445 |  4004 | 3947.5 |   241 |  297.5 |  4245 |
  |     8 | 0.9689 |  4092 | 4062.5 |   153 |  182.5 |  4245 |
  |     9 | 0.9893 |  4170 | 4157.5 |    75 |   87.5 |  4245 |
  |    10 | 1.0000 |  4217 | 4228.1 |    28 |   16.9 |  4245 |
  +----------------------------------------------------------+

       number of observations =     42450
             number of groups =        10
      Hosmer-Lemeshow chi2(8) =       109.99
                  Prob > chi2 =         0.0000


__________________________________________________________
Why it might happen something like this?, that classification andpredictive power after a probit modellooks quite good (actually very good I think), but the goodness offit test indicates that the model does not fit the data, at all?
I am really clueless here, so I would really appreciate anysuggestion on why it might happen, and most importantly, how shouldI proceed on testing it and/or modelling.
Best regards,

JG

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/




Matthew Baldwin, MD
Clinical Fellow
Department of Pulmonary, Allergy, and Critical Care Medicine
New York Presbyterian Hospital
Columbia University College of Physicians and Surgeons
P: 917-899-2187
C: 917-846-7560

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Fitting probit - estat gof puzzling results
  - From: J Gonzalez <[email protected]>

Prev by Date: Re: st: are Macro variable always possible ?
Next by Date: RE: re st: sample size ?
Previous by thread: st: Fitting probit - estat gof puzzling results
Next by thread: Re: st: Fitting probit - estat gof puzzling results
Index(es):
- Date
- Thread