Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Fitting probit - estat gof puzzling results


From   J Gonzalez <jgonzalez.1981@yahoo.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Fitting probit - estat gof puzzling results
Date   Tue, 30 Aug 2011 00:49:48 +0100 (BST)

Thank you very much for your help (Matthew Baldwin and Clyde Schechter)

I've followed Clyde's recommendation and I used lowess, logit to graphically analyze what was going on, and I found non-linearities in a couple of variables, so I added quadratic terms and now the estat gof, 10-group table looks as follows:
  +----------------------------------------------------------+
  | Group |   Prob | Obs_1 |  Exp_1 | Obs_0 |  Exp_0 | Total |
  |-------+--------+-------+--------+-------+--------+-------|
  |     1 | 0.0968 |   225 |  222.5 |  4021 | 4023.5 |  4246 |
  |     2 | 0.1928 |   635 |  607.4 |  3610 | 3637.6 |  4245 |
  |     3 | 0.3265 |  1080 | 1083.3 |  3165 | 3161.7 |  4245 |
  |     4 | 0.5803 |  1861 | 1873.9 |  2384 | 2371.1 |  4245 |
  |     5 | 0.8053 |  3097 | 3020.4 |  1148 | 1224.6 |  4245 |
  |-------+--------+-------+--------+-------+--------+-------|
  |     6 | 0.8871 |  3669 | 3610.0 |   576 |  635.0 |  4245 |
  |     7 | 0.9342 |  3861 | 3873.4 |   384 |  371.6 |  4245 |
  |     8 | 0.9665 |  4016 | 4038.1 |   229 |  206.9 |  4245 |
  |     9 | 0.9899 |  4122 | 4155.9 |   123 |   89.1 |  4245 |
  |    10 | 1.0000 |  4204 | 4229.5 |    41 |   15.5 |  4245 |
  +----------------------------------------------------------+

So, I guess now it has a better fit than before (however, the Prob > chi2 is still 0.0000, hopefully, because the p-value is "on steroids" due to the large sample). 
What do you think?

Now, this model is being built in order to understand what are the factors that might influence a person's decision to apply (or does not) for the program, and how those factors influence the decision (mostly the sign of the coefficient, but probably also the relative magnitude among the marginal effects, in order to identify those factors that affect the most the decision). The intended use of the results would be informing the design of a strategy to increase the number of people interested in participating in the program (probably, but not necessarily, targeting the low probability people). Given that, I was much more concerned with the fit and calibration of the model than in its discrimination capability. So, I will keep on trying to improve my model, however, I am following Clyde 's "Moral of the story" because every model I've tried shows Hosmer-Lemeshow statistic with p-value = 0.0000, so I guess it should be due to the large sample rather
 than due to a complete lack of fit. Any comment on all of this will be for sure illuminating for me.

Thanks again for your help.

Best regards, 

Jesús González




----- Ursprüngliche Message -----
Von: Clyde B Schechter <clyde.schechter@einstein.yu.edu>
An: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Cc: 
Gesendet: 17:05 Montag, 29.August 2011 
Betreff: Re: st: Fitting probit - estat gof puzzling results

Matthew Baldwin wrote: "but as I understand it you're 
measuring model discrimination with lroc and model calibration with 
estat gof. There's a trade-off, as one gets getter, the other gets 
worse."

The first part is correct.  But there is no necessary trade-off between calibration and discrimination.  In fact, it is not hard to concoct toy examples where both are perfect (i.e. AROC = 1.0 and Hosmer-Lemeshow Chi-Square = 0.)

The area under the ROC curve actually depends only on the ordinal properties of the predictor: any monotone transform will give the identical result.  But the Hosmer-Lemeshow calibration statistic actually looks at how close the predicted probabilities and observed probabilities match.  As such, it is really a test of whether the underlying statistical model is a good specification of the data generating process.

What J. Gonzalez is experiencing is the fact that in a sample of more than 40,000 there is enormous power to detect small departures from the form of the probit model.  If you look at the table of observed and expected numbers (and disregard the p-value which is "on steroids" due to the large sample), to my eye they actually look very good:

> +----------------------------------------------------------+
> | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
> |-------+--------+-------+--------+-------+--------+-------|
> | 1 | 0.0293 | 97 | 50.2 | 4148 | 4194.8 | 4245 |
> | 2 | 0.0881 | 267 | 234.1 | 3978 | 4010.9 | 4245 |
> | 3 | 0.2731 | 552 | 674.2 | 3693 | 3570.8 | 4245 |
> | 4 | 0.7419 | 2120 | 2203.3 | 2125 | 2041.7 | 4245 |
> | 5 | 0.8664 | 3445 | 3475.3 | 800 | 769.7 | 4245 |
> |-------+--------+-------+--------+-------+--------+-------|
> | 6 | 0.9136 | 3806 | 3787.5 | 439 | 457.5 | 4245 |
> | 7 | 0.9445 | 4004 | 3947.5 | 241 | 297.5 | 4245 |
> | 8 | 0.9689 | 4092 | 4062.5 | 153 | 182.5 | 4245 |
> | 9 | 0.9893 | 4170 | 4157.5 | 75 | 87.5 | 4245 |
> | 10 | 1.0000 | 4217 | 4228.1 | 28 | 16.9 | 4245 |
> +----------------------------------------------------------+

In the two lowest deciles his model is under-predicting 1-outcomes, and there is a substantial over-prediction in the third decile.  But after that, the observed and expected numbers are as close as one would want for most practical purposes.

So, the process that generated JG's data is not exactly a probit model based on the variables he or she chose.  The model is slightly mis-specified.   But it's pretty close, except at the low end.  What to do next depends on the intended use of the results.   If the plan is to target those identified as low-probability users with some kind of intervention to get them to use the program, and if there are appreciable downsides to failing to provide the program to those who won't sign up for the program (lowest 2 deciles) or to providing the program to those who will (third decile), then JG might want to refine the model: explore the relationships between the predictor variables and outcomes graphically [especially -lowess, logit-] to see if there is some non-linearity at the low end that might be captured with splines or quadratic terms, or see if there is an interaction term that could be added (or deleted!) to account for the difference.  

But given the general field of application of the problem, my guess is that JG's existing model is already adequate for the uses to which it will be put.  It's already a better fit than we usually see in this kind of application.

Moral of the story: don't over-react to p-values generated from huge samples.  Think about the fit of the model in pragmatic terms.

Clyde Schechter
Dept. of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA

*
*   For searches and help try:
*  http://www.stata.com/help.cgi?searchhttp://www.stata.com/support/statalist/faqhttp://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index