Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: -binreg-


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: -binreg-
Date   Thu, 21 Nov 2002 19:18:06 -0000

Jay Kaufman
>
> The -binreg- routine fits generalized linear models for the
> binomial family.  It is presumably preferred over fitting the same
> model in -glm-, not only for the convenience of not having to
> specify the distributional family in the command line, but also
> because in iteratively seeking the estimates it checks to make sure
> that they are consistent with the range of allowable probabilities
> (i.e. 0 to 1), as described on page 138 of the manual [Ref A-G].
> So my question is, why does -binreg- appear to be so
> bad at this checking?
>
> Take a very simple model using the auto.dta.
>
> . use "C:\Stata\auto.dta", clear
> (1978 Automobile Data)
>
> . binreg  foreign mpg, rr
>
> Residual df  =        72                   No. of obs =        74
> Pearson X2   =  73.88014                   Deviance   =  78.99933
> Dispersion   =  1.026113                   Dispersion =  1.097213
>
> Bernoulli distribution, log link
> ------------------------------------------------------------
> ----------
>         |                 EIM
> foreign | Risk Ratio   Std. Err.    z    P>|z|    [95%
> Conf. Interval]
> --------+---------------------------------------------------
> ----------
>     mpg |   1.097213   .0109901   9.26   0.000    1.075883
>   1.118966
> ------------------------------------------------------------
> ----------
>
> . predict phat, mu
>
> . sum phat
>
>     Variable |     Obs        Mean   Std. Dev.       Min        Max
> -------------+-----------------------------------------------------
>         phat |      74    .3008965     .22691   .1072727   1.580984
>
> Clearly a predicted probability > 1.5 is not a good estimate.  Did
> I do something wrong?  Or did -binreg- do something wrong?  Or is
> this simply another example of why linear models of the logit and
> probit have dominated analysis of binary data for decades?
>
> By the way, note that if I fit the exact same model using -glm-,
> this same observation gets a predicted probability of 1.43, so
> -binreg- actually seems to do worse.

This kind of comment could be extended
indefinitely and made whenever a model yields predictions
that violate limits known to the modeller.
Yet models with inappropriate limiting behaviour
remain in the repertoire for various reasons,
some force of habit or tradition, but
one often being that they may well be adequate
or even best for the range of data found in practice.

My interpretation is that, as in some of those
children's stories, you got exactly what you asked for,
and that's your punishment.

Specifically, in the case of -binreg- there is
no such check -- according to my scan of the
code. As you imply, a prerogative
of the modeller, and usually a mark of
good modelling taste, is to prefer a model
that will always yield qualitatively
correct predictions, as is guaranteed
here by appropriate choice of link.

With the -or- option (i.e. logit link)
predictions are bounded appropriately
and very close to what the equivalent -glm-
yields.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index