Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
J Gonzalez <jgonzalez.1981@yahoo.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Fitting probit - estat gof puzzling results |

Date |
Fri, 26 Aug 2011 23:07:25 +0100 (BST) |

Dear Stata list members I am trying to estimate a probit model to understand which variables influence (and how they do it) the decision of an individual to apply for a health prevention program. I have a dataset (nearly 40 thousand obs) with information about applicants and non applicants, containing variables with individual's information on demographics, health status and health related risk factors, as well as socioeconomic indicators (education, employment and housing information). With this information I am trying to fit a probit model to estimate the individual's probability of applying for the program, given variables like age, educ, health status indicators and so on (theoretically, those variables might affect the decision to apply). I am not an expert, so I checked the stata probit post estimation examples in the base reference manual, and I found several commands useful to test the goodness of fit of my model, and here's how it looks. __________________________________________________________ estat clas, all Correctly classified = 90.02% Sensitivity = 93.94% Specificity = 83.31% So, it seems quite good classification power (though a little bit better for the positive-outcome cases) __________________________________________________________ Then I looked at the prediction and it looks like this (mean quite similar). predict p sum p apply Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p | 42450 .6306243 .3935977 .0002053 .9999337 apply | 42451 .6306094 .4826455 0 1 __________________________________________________________ Then, using lroc area under ROC curve = 0.9488 So, following Stata base reference manual, "The greater the predictive power, the more *bowed the curve, and hence the area beneath the curve is often used as a measure of the predictive *power. A model with no predictive power has area 0.5; a perfect model has area 1", hence, I guess the model is quite good because the area under the ROC curve in my model is pretty much closer to a perfect model, than a model without predictive power. __________________________________________________________ HOWEVER, estat gof does not seem to tell the same story Actually, it is the opposite story, because the null hypothesis is soundly rejected, indicating that the model does not fit the data (am I right?). . estat gof Probit model for apply, goodness-of-fit test number of observations = 42450 number of covariate patterns = 42409 Pearson chi2(42245) = 58810.50 Prob > chi2 = 0.0000 . estat gof, group(10) table Probit model for apply, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) +----------------------------------------------------------+ | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total | |-------+--------+-------+--------+-------+--------+-------| | 1 | 0.0293 | 97 | 50.2 | 4148 | 4194.8 | 4245 | | 2 | 0.0881 | 267 | 234.1 | 3978 | 4010.9 | 4245 | | 3 | 0.2731 | 552 | 674.2 | 3693 | 3570.8 | 4245 | | 4 | 0.7419 | 2120 | 2203.3 | 2125 | 2041.7 | 4245 | | 5 | 0.8664 | 3445 | 3475.3 | 800 | 769.7 | 4245 | |-------+--------+-------+--------+-------+--------+-------| | 6 | 0.9136 | 3806 | 3787.5 | 439 | 457.5 | 4245 | | 7 | 0.9445 | 4004 | 3947.5 | 241 | 297.5 | 4245 | | 8 | 0.9689 | 4092 | 4062.5 | 153 | 182.5 | 4245 | | 9 | 0.9893 | 4170 | 4157.5 | 75 | 87.5 | 4245 | | 10 | 1.0000 | 4217 | 4228.1 | 28 | 16.9 | 4245 | +----------------------------------------------------------+ number of observations = 42450 number of groups = 10 Hosmer-Lemeshow chi2(8) = 109.99 Prob > chi2 = 0.0000 __________________________________________________________ Why it might happen something like this?, that classification and predictive power after a probit model looks quite good (actually very good I think), but the goodness of fit test indicates that the model does not fit the data, at all? I am really clueless here, so I would really appreciate any suggestion on why it might happen, and most importantly, how should I proceed on testing it and/or modelling. Best regards, JG * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Fitting probit - estat gof puzzling results***From:*"Matthew Baldwin, MD" <mrb45@columbia.edu>

- Prev by Date:
**Re: st: -svy- question** - Next by Date:
**Re: st: SVY question** - Previous by thread:
**st: variance and covariance in estimation using cmp** - Next by thread:
**Re: st: Fitting probit - estat gof puzzling results** - Index(es):