Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Clyde B Schechter <clyde.schechter@einstein.yu.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Fitting probit - estat gof puzzling results |

Date |
Mon, 29 Aug 2011 15:05:54 +0000 |

Matthew Baldwin wrote: "but as I understand it you're measuring model discrimination with lroc and model calibration with estat gof. There's a trade-off, as one gets getter, the other gets worse." The first part is correct. But there is no necessary trade-off between calibration and discrimination. In fact, it is not hard to concoct toy examples where both are perfect (i.e. AROC = 1.0 and Hosmer-Lemeshow Chi-Square = 0.) The area under the ROC curve actually depends only on the ordinal properties of the predictor: any monotone transform will give the identical result. But the Hosmer-Lemeshow calibration statistic actually looks at how close the predicted probabilities and observed probabilities match. As such, it is really a test of whether the underlying statistical model is a good specification of the data generating process. What J. Gonzalez is experiencing is the fact that in a sample of more than 40,000 there is enormous power to detect small departures from the form of the probit model. If you look at the table of observed and expected numbers (and disregard the p-value which is "on steroids" due to the large sample), to my eye they actually look very good: > +----------------------------------------------------------+ > | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total | > |-------+--------+-------+--------+-------+--------+-------| > | 1 | 0.0293 | 97 | 50.2 | 4148 | 4194.8 | 4245 | > | 2 | 0.0881 | 267 | 234.1 | 3978 | 4010.9 | 4245 | > | 3 | 0.2731 | 552 | 674.2 | 3693 | 3570.8 | 4245 | > | 4 | 0.7419 | 2120 | 2203.3 | 2125 | 2041.7 | 4245 | > | 5 | 0.8664 | 3445 | 3475.3 | 800 | 769.7 | 4245 | > |-------+--------+-------+--------+-------+--------+-------| > | 6 | 0.9136 | 3806 | 3787.5 | 439 | 457.5 | 4245 | > | 7 | 0.9445 | 4004 | 3947.5 | 241 | 297.5 | 4245 | > | 8 | 0.9689 | 4092 | 4062.5 | 153 | 182.5 | 4245 | > | 9 | 0.9893 | 4170 | 4157.5 | 75 | 87.5 | 4245 | > | 10 | 1.0000 | 4217 | 4228.1 | 28 | 16.9 | 4245 | > +----------------------------------------------------------+ In the two lowest deciles his model is under-predicting 1-outcomes, and there is a substantial over-prediction in the third decile. But after that, the observed and expected numbers are as close as one would want for most practical purposes. So, the process that generated JG's data is not exactly a probit model based on the variables he or she chose. The model is slightly mis-specified. But it's pretty close, except at the low end. What to do next depends on the intended use of the results. If the plan is to target those identified as low-probability users with some kind of intervention to get them to use the program, and if there are appreciable downsides to failing to provide the program to those who won't sign up for the program (lowest 2 deciles) or to providing the program to those who will (third decile), then JG might want to refine the model: explore the relationships between the predictor variables and outcomes graphically [especially -lowess, logit-] to see if there is some non-linearity at the low end that might be captured with splines or quadratic terms, or see if there is an interaction term that could be added (or deleted!) to account for the difference. But given the general field of application of the problem, my guess is that JG's existing model is already adequate for the uses to which it will be put. It's already a better fit than we usually see in this kind of application. Moral of the story: don't over-react to p-values generated from huge samples. Think about the fit of the model in pragmatic terms. Clyde Schechter Dept. of Family & Social Medicine Albert Einstein College of Medicine Bronx, NY, USA * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Fitting probit - estat gof puzzling results***From:*J Gonzalez <jgonzalez.1981@yahoo.com>

- Prev by Date:
**Re: st: foreach / forvalues loop error** - Next by Date:
**Re: Re: Re: st: placing two textboxes in a graph on the x-axis** - Previous by thread:
**Re: st: Fitting probit - estat gof puzzling results** - Next by thread:
**Re: st: Fitting probit - estat gof puzzling results** - Index(es):