Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Matthew Baldwin, MD" <mrb45@columbia.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Fitting probit - estat gof puzzling results |

Date |
Sun, 28 Aug 2011 15:17:50 -0400 |

Matt Quoting J Gonzalez <jgonzalez.1981@yahoo.com>:

Dear Stata list membersI am trying to estimate a probit model to understand which variablesinfluence (and how they do it) the decision of an individual toapply for a health prevention program.I have a dataset (nearly 40 thousand obs) with information aboutapplicants and non applicants, containing variables withindividual's information on demographics, health status and healthrelated risk factors, as well as socioeconomic indicators(education, employment and housing information). With thisinformation I am trying to fit a probit model to estimate theindividual's probability of applying for the program, givenvariables like age, educ, health status indicators and so on(theoretically, those variables might affect the decision to apply).I am not an expert, so I checked the stata probit post estimationexamples in the base reference manual, and I found several commandsuseful to test the goodness of fit of my model, and here's how itlooks.__________________________________________________________ estat clas, all Correctly classified = 90.02% Sensitivity = 93.94% Specificity = 83.31%So, it seems quite good classification power (though a little bitbetter for the positive-outcome cases)__________________________________________________________ Then I looked at the prediction and it looks like this (mean quite similar). predict p sum p apply Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p | 42450 .6306243 .3935977 .0002053 .9999337 apply | 42451 .6306094 .4826455 0 1 __________________________________________________________ Then, using lroc area under ROC curve = 0.9488So, following Stata base reference manual, "The greater thepredictive power, the more*bowed the curve, and hence the area beneath the curve is often usedas a measure of the predictive*power. A model with no predictive power has area 0.5; a perfectmodel has area 1", hence, I guess the model is quite good becausethe area under the ROC curve in my model is pretty much closer to aperfect model, than a model without predictive power.__________________________________________________________ HOWEVER, estat gof does not seem to tell the same storyActually, it is the opposite story, because the null hypothesis issoundly rejected,indicating that the model does not fit the data (am I right?). . estat gof Probit model for apply, goodness-of-fit test number of observations = 42450 number of covariate patterns = 42409 Pearson chi2(42245) = 58810.50 Prob > chi2 = 0.0000 . estat gof, group(10) table Probit model for apply, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) +----------------------------------------------------------+ | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total | |-------+--------+-------+--------+-------+--------+-------| | 1 | 0.0293 | 97 | 50.2 | 4148 | 4194.8 | 4245 | | 2 | 0.0881 | 267 | 234.1 | 3978 | 4010.9 | 4245 | | 3 | 0.2731 | 552 | 674.2 | 3693 | 3570.8 | 4245 | | 4 | 0.7419 | 2120 | 2203.3 | 2125 | 2041.7 | 4245 | | 5 | 0.8664 | 3445 | 3475.3 | 800 | 769.7 | 4245 | |-------+--------+-------+--------+-------+--------+-------| | 6 | 0.9136 | 3806 | 3787.5 | 439 | 457.5 | 4245 | | 7 | 0.9445 | 4004 | 3947.5 | 241 | 297.5 | 4245 | | 8 | 0.9689 | 4092 | 4062.5 | 153 | 182.5 | 4245 | | 9 | 0.9893 | 4170 | 4157.5 | 75 | 87.5 | 4245 | | 10 | 1.0000 | 4217 | 4228.1 | 28 | 16.9 | 4245 | +----------------------------------------------------------+ number of observations = 42450 number of groups = 10 Hosmer-Lemeshow chi2(8) = 109.99 Prob > chi2 = 0.0000 __________________________________________________________Why it might happen something like this?, that classification andpredictive power after a probit modellooks quite good (actually very good I think), but the goodness offit test indicates that the model does not fit the data, at all?I am really clueless here, so I would really appreciate anysuggestion on why it might happen, and most importantly, how shouldI proceed on testing it and/or modelling.Best regards, JG * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Matthew Baldwin, MD Clinical Fellow Department of Pulmonary, Allergy, and Critical Care Medicine New York Presbyterian Hospital Columbia University College of Physicians and Surgeons P: 917-899-2187 C: 917-846-7560 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Fitting probit - estat gof puzzling results***From:*J Gonzalez <jgonzalez.1981@yahoo.com>

- Prev by Date:
**Re: st: are Macro variable always possible ?** - Next by Date:
**RE: re st: sample size ?** - Previous by thread:
**st: Fitting probit - estat gof puzzling results** - Next by thread:
**Re: st: Fitting probit - estat gof puzzling results** - Index(es):