Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Fitting probit - estat gof puzzling results


From   Clyde B Schechter <clyde.schechter@einstein.yu.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Fitting probit - estat gof puzzling results
Date   Mon, 29 Aug 2011 15:05:54 +0000

Matthew Baldwin wrote: "but as I understand it you're 
measuring model discrimination with lroc and model calibration with 
estat gof. There's a trade-off, as one gets getter, the other gets 
worse."

The first part is correct.  But there is no necessary trade-off between calibration and discrimination.  In fact, it is not hard to concoct toy examples where both are perfect (i.e. AROC = 1.0 and Hosmer-Lemeshow Chi-Square = 0.)

The area under the ROC curve actually depends only on the ordinal properties of the predictor: any monotone transform will give the identical result.  But the Hosmer-Lemeshow calibration statistic actually looks at how close the predicted probabilities and observed probabilities match.  As such, it is really a test of whether the underlying statistical model is a good specification of the data generating process.

What J. Gonzalez is experiencing is the fact that in a sample of more than 40,000 there is enormous power to detect small departures from the form of the probit model.  If you look at the table of observed and expected numbers (and disregard the p-value which is "on steroids" due to the large sample), to my eye they actually look very good:

> +----------------------------------------------------------+
> | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
> |-------+--------+-------+--------+-------+--------+-------|
> | 1 | 0.0293 | 97 | 50.2 | 4148 | 4194.8 | 4245 |
> | 2 | 0.0881 | 267 | 234.1 | 3978 | 4010.9 | 4245 |
> | 3 | 0.2731 | 552 | 674.2 | 3693 | 3570.8 | 4245 |
> | 4 | 0.7419 | 2120 | 2203.3 | 2125 | 2041.7 | 4245 |
> | 5 | 0.8664 | 3445 | 3475.3 | 800 | 769.7 | 4245 |
> |-------+--------+-------+--------+-------+--------+-------|
> | 6 | 0.9136 | 3806 | 3787.5 | 439 | 457.5 | 4245 |
> | 7 | 0.9445 | 4004 | 3947.5 | 241 | 297.5 | 4245 |
> | 8 | 0.9689 | 4092 | 4062.5 | 153 | 182.5 | 4245 |
> | 9 | 0.9893 | 4170 | 4157.5 | 75 | 87.5 | 4245 |
> | 10 | 1.0000 | 4217 | 4228.1 | 28 | 16.9 | 4245 |
> +----------------------------------------------------------+

In the two lowest deciles his model is under-predicting 1-outcomes, and there is a substantial over-prediction in the third decile.  But after that, the observed and expected numbers are as close as one would want for most practical purposes.

So, the process that generated JG's data is not exactly a probit model based on the variables he or she chose.  The model is slightly mis-specified.   But it's pretty close, except at the low end.  What to do next depends on the intended use of the results.   If the plan is to target those identified as low-probability users with some kind of intervention to get them to use the program, and if there are appreciable downsides to failing to provide the program to those who won't sign up for the program (lowest 2 deciles) or to providing the program to those who will (third decile), then JG might want to refine the model: explore the relationships between the predictor variables and outcomes graphically [especially -lowess, logit-] to see if there is some non-linearity at the low end that might be captured with splines or quadratic terms, or see if there is an interaction term that could be added (or deleted!) to account for the difference.  

But given the general field of application of the problem, my guess is that JG's existing model is already adequate for the uses to which it will be put.  It's already a better fit than we usually see in this kind of application.

Moral of the story: don't over-react to p-values generated from huge samples.  Think about the fit of the model in pragmatic terms.

Clyde Schechter
Dept. of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index