Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

RE: st: probit vs. logit

 From "Nick Cox" To Subject RE: st: probit vs. logit Date Tue, 25 May 2010 12:19:55 +0100

```The answers "In principle, you can definitely prefer one to the other"
and "In practice, the results may be very close" can both be true. (A
statistical version of complementarity to please the ghost of Niels
Bohr?)

Developing Michael's line of argument, one simple thing I see less done
than it might be is just to calculate predictions and compare. Thus
predictions on a probability scale (p, say) can be got after -logit-

. predict logit_p
. label var logit_p "logit prediction"

and after -probit-

. predict probit_p "probit prediction"

Then any number of graphical and numerical comparisons are possible. The
scatter plot

. scatter logit_p probit_p

is the propaganda or sales pitch plot "Look, the predictions are the
same, really!" while to turn a magnifying-glass on the fine structure of
disagreement it may make as much or more sense to compare using log p,
log(1 - p), logit p or yet other scales.
Here the science underlying what is being done, assuming that there is
some, is important in guiding assessment.

Nick
n.j.cox@durham.ac.uk

Michael N. Mitchell

I agree with Martin, that the choice of "logit" vs. "probit" appears to
be largely
discipline specific. If this is for publication or presentation, then it
might be useful
to see what the customs are for your audience.

If someone gets picky with you and really wants to see a comparison of
the model fit of
the two models, I think you could use -estimates store- and -estimates
stats- (as shown
below) to compare the fit of the models using the AIC and/or BIC (where
a smaller value
means better fit). As in the example below, the two values are nearly
identical, and I
think we all expect that this would generally be the case.

--- snip ---

. sysuse auto
(1978 Automobile Data)

. logit  foreign mpg price weight

Iteration 0:   log likelihood =  -45.03321
Iteration 1:   log likelihood = -22.244792
Iteration 2:   log likelihood = -18.069284
Iteration 3:   log likelihood = -17.184699
Iteration 4:   log likelihood = -17.161975
Iteration 5:   log likelihood = -17.161893
Iteration 6:   log likelihood = -17.161893

Logistic regression                               Number of obs   =
74
LR chi2(3)      =
55.74
Prob > chi2     =
0.0000
Log likelihood = -17.161893                       Pseudo R2       =
0.6189

------------------------------------------------------------------------
------
foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
mpg |  -.1210918   .0956855    -1.27   0.206     -.308632
.0664483
price |   .0009264   .0003074     3.01   0.003      .000324
.0015288
weight |  -.0068497   .0019996    -3.43   0.001    -.0107688
-.0029306
_cons |   14.42237   5.414367     2.66   0.008      3.81041
25.03434
------------------------------------------------------------------------
------

. estimates store model1

. probit  foreign mpg price weight

teration 0:   log likelihood =  -45.03321
Iteration 1:   log likelihood = -20.083125
Iteration 2:   log likelihood = -17.363271
Iteration 3:   log likelihood = -17.152935
Iteration 4:   log likelihood = -17.151715
Iteration 5:   log likelihood = -17.151715

Probit regression                                 Number of obs   =
74
LR chi2(3)      =
55.76
Prob > chi2     =
0.0000
Log likelihood = -17.151715                       Pseudo R2       =
0.6191

------------------------------------------------------------------------
------
foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
mpg |  -.0723615   .0556501    -1.30   0.193    -.1814337
.0367106
price |   .0005185   .0001651     3.14   0.002      .000195
.0008421
weight |  -.0038232   .0010392    -3.68   0.000      -.00586
-.0017864
_cons |   8.150001   2.962982     2.75   0.006     2.342664
13.95734
------------------------------------------------------------------------
------

. estimates store model2

. estimates stats model1 model2

------------------------------------------------------------------------
-----
Model |    Obs    ll(null)   ll(model)     df          AIC
BIC
-------------+----------------------------------------------------------
-----
model1 |     74   -45.03321   -17.16189      4     42.32379
51.54005
model2 |     74   -45.03321   -17.15171      4     42.30343
51.51969
------------------------------------------------------------------------
-----
Note:  N=Obs used in calculating BIC; see [R] BIC note

--- snip ----

I hope that helps,

On 2010-05-24 11.36 PM, Maarten buis wrote:
> --- On Mon, 24/5/10, SR Millis wrote:
>> Logistic regression is generally preferred over the probit
>> model because of the wider variety of fit statistics. Also,
>> exponentiated logit coefficients can be interpreted as odds
>> ratios---which is not the case with probit coefficients.
>
> A general preference for one or the other is to a large
> extend discipline dependent. For example, within economics
> the probit is the "default" method. I like interpreting
> effects in terms of odds ratios as a way of identifying the
> scale, which is unidentified in a probit model (it is
> identified by fixing the residual variance to one, which
> has all kinds of nasty consequences when interpreting
> interaction terms). So, I tend to use the -logit-.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```