Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: probit vs. logit |

Date |
Tue, 25 May 2010 12:19:55 +0100 |

The answers "In principle, you can definitely prefer one to the other" and "In practice, the results may be very close" can both be true. (A statistical version of complementarity to please the ghost of Niels Bohr?) Developing Michael's line of argument, one simple thing I see less done than it might be is just to calculate predictions and compare. Thus predictions on a probability scale (p, say) can be got after -logit- . predict logit_p . label var logit_p "logit prediction" and after -probit- . predict probit_p "probit prediction" Then any number of graphical and numerical comparisons are possible. The scatter plot . scatter logit_p probit_p is the propaganda or sales pitch plot "Look, the predictions are the same, really!" while to turn a magnifying-glass on the fine structure of disagreement it may make as much or more sense to compare using log p, log(1 - p), logit p or yet other scales. Here the science underlying what is being done, assuming that there is some, is important in guiding assessment. Nick n.j.cox@durham.ac.uk Michael N. Mitchell I agree with Martin, that the choice of "logit" vs. "probit" appears to be largely discipline specific. If this is for publication or presentation, then it might be useful to see what the customs are for your audience. If someone gets picky with you and really wants to see a comparison of the model fit of the two models, I think you could use -estimates store- and -estimates stats- (as shown below) to compare the fit of the models using the AIC and/or BIC (where a smaller value means better fit). As in the example below, the two values are nearly identical, and I think we all expect that this would generally be the case. --- snip --- . sysuse auto (1978 Automobile Data) . logit foreign mpg price weight Iteration 0: log likelihood = -45.03321 Iteration 1: log likelihood = -22.244792 Iteration 2: log likelihood = -18.069284 Iteration 3: log likelihood = -17.184699 Iteration 4: log likelihood = -17.161975 Iteration 5: log likelihood = -17.161893 Iteration 6: log likelihood = -17.161893 Logistic regression Number of obs = 74 LR chi2(3) = 55.74 Prob > chi2 = 0.0000 Log likelihood = -17.161893 Pseudo R2 = 0.6189 ------------------------------------------------------------------------ ------ foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- ------ mpg | -.1210918 .0956855 -1.27 0.206 -.308632 .0664483 price | .0009264 .0003074 3.01 0.003 .000324 .0015288 weight | -.0068497 .0019996 -3.43 0.001 -.0107688 -.0029306 _cons | 14.42237 5.414367 2.66 0.008 3.81041 25.03434 ------------------------------------------------------------------------ ------ . estimates store model1 . probit foreign mpg price weight teration 0: log likelihood = -45.03321 Iteration 1: log likelihood = -20.083125 Iteration 2: log likelihood = -17.363271 Iteration 3: log likelihood = -17.152935 Iteration 4: log likelihood = -17.151715 Iteration 5: log likelihood = -17.151715 Probit regression Number of obs = 74 LR chi2(3) = 55.76 Prob > chi2 = 0.0000 Log likelihood = -17.151715 Pseudo R2 = 0.6191 ------------------------------------------------------------------------ ------ foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- ------ mpg | -.0723615 .0556501 -1.30 0.193 -.1814337 .0367106 price | .0005185 .0001651 3.14 0.002 .000195 .0008421 weight | -.0038232 .0010392 -3.68 0.000 -.00586 -.0017864 _cons | 8.150001 2.962982 2.75 0.006 2.342664 13.95734 ------------------------------------------------------------------------ ------ . estimates store model2 . estimates stats model1 model2 ------------------------------------------------------------------------ ----- Model | Obs ll(null) ll(model) df AIC BIC -------------+---------------------------------------------------------- ----- model1 | 74 -45.03321 -17.16189 4 42.32379 51.54005 model2 | 74 -45.03321 -17.15171 4 42.30343 51.51969 ------------------------------------------------------------------------ ----- Note: N=Obs used in calculating BIC; see [R] BIC note --- snip ---- I hope that helps, On 2010-05-24 11.36 PM, Maarten buis wrote: > --- On Mon, 24/5/10, SR Millis wrote: >> Logistic regression is generally preferred over the probit >> model because of the wider variety of fit statistics. Also, >> exponentiated logit coefficients can be interpreted as odds >> ratios---which is not the case with probit coefficients. > > A general preference for one or the other is to a large > extend discipline dependent. For example, within economics > the probit is the "default" method. I like interpreting > effects in terms of odds ratios as a way of identifying the > scale, which is unidentified in a probit model (it is > identified by fixing the residual variance to one, which > has all kinds of nasty consequences when interpreting > interaction terms). So, I tend to use the -logit-. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**AW: st: probit vs. logit***From:*"Martin Weiss" <martin.weiss1@gmx.de>

**References**:**Re: st: probit vs. logit***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**Re: st: probit vs. logit***From:*"Michael N. Mitchell" <Michael.Norman.Mitchell@gmail.com>

- Prev by Date:
**Re: st: Panel Multinomial Logistic Model** - Next by Date:
**st: Outcome effect calculation with Propensity Score matching** - Previous by thread:
**Re: st: probit vs. logit** - Next by thread:
**AW: st: probit vs. logit** - Index(es):