Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: st: Fitting probit - estat gof puzzling results


From   J Gonzalez <jgonzalez.1981@yahoo.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: Re: st: Fitting probit - estat gof puzzling results
Date   Thu, 1 Sep 2011 10:42:58 +0100 (BST)

Clyde Schechter, thank you very much  for your guidance. It has been very helpful indeed.

Although, for my current project I think I will stop there in that model, your explanation raised a couple of questions to me.

You said models like this (that discriminate quite well but are not well calibrated), "may still be useful for understanding factors that promote or inhibit applying, even if they are not well calibrated--but such models would not be suitable for some other purposes". My questions are two-fold:

1) If the data generating process does not match with the model I am using, then the assumptions about variable's distribution might not hold (for example probability distribution of residuals). Therefore, every statistic that depends on such assumptions might also be unreliable, and hence, the simplest hypothesis testing might be untrustworthy (for example, if one variable coefficient is statistically different from 0).  Am I right? If I am right why and/or how the model "may still be useful for understanding factors that promote or inhibit applying"?. I am pretty sure that you are right when said that, however, I cannot figure out how it would be justified from a theoretical point of view.


Well, as I said I am not an expert in this, so I apologize in advance if the question sounds quite silly, but everything I read about econometrics points out (or implies, or suggest) that if the assumption of the data generating process nature does not hold, the model is not reliable (not consistent, and sometimes even biased). That is probably because I am relying mostly on text books (Wooldridge, 2002) and hence, trying to test every testable assumption of the model, but it would be great if you can point me to literature where I can go deeper on this issue of the usefulness and properties of non-well-calibrated models.

2) What are the kind of purposes for which such a model might not be suitable? For example, an out-of-sample prediction of the probability of applying given the RHS variables?


Once again thank your for your help.

Best regards,

Jesús González


Wooldridge, J.M. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, 2002.





----- Ursprüngliche Message -----
Von: Clyde B Schechter <clyde.schechter@einstein.yu.edu>
An: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Cc: 
Gesendet: 18:40 Dienstag, 30.August 2011 
Betreff: Re: Re: st: Fitting probit - estat gof puzzling results

So, after revising his model, Jesus Gonzalez gets these calibration results:

  +----------------------------------------------------------+
  | Group |   Prob | Obs_1 |  Exp_1 | Obs_0 |  Exp_0 | Total |
  |-------+--------+-------+--------+-------+--------+-------|
  |     1 | 0.0968 |   225 |  222.5 |  4021 | 4023.5 |  4246 |
  |     2 | 0.1928 |   635 |  607.4 |  3610 | 3637.6 |  4245 |
  |     3 | 0.3265 |  1080 | 1083.3 |  3165 | 3161.7 |  4245 |
  |     4 | 0.5803 |  1861 | 1873.9 |  2384 | 2371.1 |  4245 |
  |     5 | 0.8053 |  3097 | 3020.4 |  1148 | 1224.6 |  4245 |
  |-------+--------+-------+--------+-------+--------+-------|
  |     6 | 0.8871 |  3669 | 3610.0 |   576 |  635.0 |  4245 |
  |     7 | 0.9342 |  3861 | 3873.4 |   384 |  371.6 |  4245 |
  |     8 | 0.9665 |  4016 | 4038.1 |   229 |  206.9 |  4245 |
  |     9 | 0.9899 |  4122 | 4155.9 |   123 |   89.1 |  4245 |
  |    10 | 1.0000 |  4204 | 4229.5 |    41 |   15.5 |  4245 |
  +----------------------------------------------------------+
I would consider these eye-poppingly good.  And the goal being to understand the factors that influence the decision to apply for the program, I think it would be difficult to meaningfully improve on this.  I would still disregard the p-value: it will remain "on steroids" as long as you use this huge sample.  Perhaps further tweaking of the model will produce slight improvements in fit, but I'd be surprised if what you learn from them will be worth the effort.  

Remember, it is highly unlikely that the real data generating process here is in fact a probit model based on variables you have measured or even could measure in principle.  This is one of those wrong models that I think Box would have called useful.  As long as there is even a tiny difference between the real data generating process and your statistical model, you are likely to detect that difference in a sample this size when you test calibration.  It is likely that any attempt to get your H-L chi square into non-significant territory will either fail, or will succeed at the price of fitting the noise in your data (e.g. a saturated model).

Finally, let me rant (mildly and briefly) about your lower level of concern for discrimination.  Suppose the real data generating process were that participants apply to the program with probability p = some function of an unobserved variable, u, which his independent of all your observed variables.   When you fit a model based on the x's, you will, with some noise, get a model that predicts, more or less, probability = p0  for all comers(where p0 is the marginal probability of applying to the program).  That model will be almost perfectly calibrated: in each decile of predicted probability the observed and predicted probabilities will match up very closely, both being approximately p0: but the model is completely uninformative as to _which_ subjects are applying and which are not.  You would need to look at the area under the ROC curve, which will be very close to 0.5 in this situation, to find that out from a summary statistic.  

Now that is probably not a very realistic scenario, but I'm trying to make clear the point that even a perfectly calibrated model can fail to distinguish appliers from non-appliers in any useful way.  If the discrimination is not good, the model is not useful for your purposes, no matter how well it is calibrated.  (By contrast, models that discriminate well may still be useful for understanding factors that promote or inhibit applying, even if they are not well calibrated--but such models would not be suitable for some other purposes.)

Good luck with the rest of your project!

Clyde Schechter
Dept. of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA

*
*   For searches and help try:
*  http://www.stata.com/help.cgi?searchhttp://www.stata.com/support/statalist/faqhttp://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index