# Re: st: Binomial regression

 From rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: Binomial regression Date Fri, 03 Aug 2007 16:23:53 -0500

```Vince Wiggins and I responded on this thread yesterday, in which one of our
main points was that forcing convergence when an identity link may not be
appropriate results in a loss of model interpretation.  This is a case where
not only is there no free lunch, but the exact price of that lunch is unclear.

We mentioned various ways that convergence could be achieved.  One method I
did not mention is the result of similar work I have done working with the log
link function.  This method carries with it an ease of interpretation (akin to
knowing the price of your lunch) and involves using a modified version of the

The Identity/Logit link is a simple modification of the standard identity
link that turns into a logit link outside a prespecified range, say outside
(0.01, 0.99), the default.  Inside this range, you have the identity function;
outside you have a logit function translated and scaled so as to be continuous
and have continuous derivatives at the endpoints.

Since the link function turns to logit at the endpoints, the range of the
identity/logit inverse link is within (0,1), meaning that you won't get
inadmissible predicted probabilities and convergence can be achieved over a
much wider variety of datasets.  The only drawback is one of interpretation,
however, the interpretation here is clear:  When the predicted probability is
in the range (0.01, 0.99), your coefficients reflect risk differences,
otherwise, they are (rescaled) log(odds ratio)'s.  In other words, you need
only qualify your treatment of risk differences to state that they are valid
for risks in the range (0.01, 0.99) rather than (0,1).  To me this seems
better than trying to interpret the fruits of an ad hoc adjustment.

I attach the code for -identitylogit- below.  A simple demonstration

. sysuse auto

. glm for price mpg, fam(bin) link(identity)

< no convergence >

. glm for price mpg, fam(bin) link(identitylogit) nolog

Generalized linear models                          No. of obs      =        74
Optimization     : ML                              Residual df     =        71
Scale parameter =         1
Deviance         =  70.50849514                    (1/df) Deviance =  .9930774
Pearson          =  68.72329876                    (1/df) Pearson  =  .9679338

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = u -> logit(u)            [Identity (0.010,0.990)]

AIC             =  1.033899
Log likelihood   = -35.25424757                    BIC             = -235.0801

------------------------------------------------------------------------------
|                 OIM
foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
price |   .0000516   .0000157     3.29   0.001     .0000209    .0000823
mpg |   .0471231   .0093466     5.04   0.000     .0288041    .0654421
_cons |  -1.046464   .2299337    -4.55   0.000    -1.497126   -.5958023
------------------------------------------------------------------------------

The identity/logit link also accepts an optional argument, p, for controlling
the range of the identity part of this link.  For example,

. glm for price mpg, fam(bin) link(identitylogit 0.05)

would give a valid range for risk differences of (0.05, 0.95) rather than
the default (0.01, 0.99).

--Bobby
rgutierrez@stata.com

*! version 1.0.0   03aug2007
program define identitylogit
version 8.2
args todo eta mu return

if `todo' == -1 {                       /* Title */
if "\$SGLM_p" == "" {
global SGLM_p 0.01
}
else {
confirm number \$SGLM_p
if \$SGLM_p <= 0 | \$SGLM_p >= 0.5 {
di as error ///
"link parameter p must be between 0 and 0.5"
exit 459
}
}
if "\$SGLM_m" == "1" {
global SGLM_lf "u -> logit(u)"
}
else    {
global SGLM_lf "u/\$SGLM_m -> logit(u/\$SGLM_m)"
}
local a1 : di %5.3f \$SGLM_p
local a2 : di %5.3f 1-\$SGLM_p
global SGLM_lt "Identity (`a1',`a2')"
exit
}

tempname a b c d gp p
scalar `p' = \$SGLM_p
scalar `gp' = logit(`p')
scalar `b' = `p'*(1-`p')
scalar `a' = `p' - `b'*`gp'
scalar `d' = `b'
scalar `c' = 1-`p' - `d'*logit(1-`p')

if `todo' == 0 {			// eta = g(mu)
tempvar eta2
gen double `eta' = `mu'/\$SGLM_m
gen double `eta2' = `eta'
replace `eta' = logit(`mu'/\$SGLM_m)*`b' + `a'	///
if `eta2' <= `p'
replace `eta' = logit(`mu'/\$SGLM_m)*`d' + `c'	///
if `eta2' >= 1-`p'
exit
}
if `todo' == 1 {			// mu = g^-1(eta)
gen double `mu' = \$SGLM_m*`eta'
replace `mu' = \$SGLM_m*invlogit((`eta'-`a')/`b') ///
if `eta' <= `p'
replace `mu' = \$SGLM_m*invlogit((`eta'-`c')/`d') ///
if `eta' >= 1-`p'
exit
}
if `todo' == 2 {			// (d mu)/(d eta)
gen double `return' = \$SGLM_m
replace `return' = (`mu'*(1-`mu'/\$SGLM_m)) / `b' ///
if `eta' <= `p'
replace `return' = (`mu'*(1-`mu'/\$SGLM_m)) / `d' ///
if `eta' >= 1-`p'
exit
}
if `todo' == 3 {			// (d^2 mu)(d eta^2)
gen double `return' = 0
replace `return' = (`mu'*(1-`mu'/\$SGLM_m)* ///
(1-2*`mu'/\$SGLM_m)) / (`b'^2)   ///
if `eta' <= `p'
replace `return' = (`mu'*(1-`mu'/\$SGLM_m)* ///
(1-2*`mu'/\$SGLM_m)) / (`d'^2)   ///
if `eta' >= 1-`p'
exit
}
noi di as err "Unknown call to glm link function"
exit 198
end