# st: RE: curious behavior of glm

 From jhilbe@aol.com To statalist@hsphsun2.harvard.edu Subject st: RE: curious behavior of glm Date Fri, 05 Jun 2009 12:09:22 -0400

Regarding the estimation of 1) a single observation logistic model, and 2) a two observation logistic model, having the binomial form with a y being the binomial numerator and n the denominator:
```
```
When you use cii, or engage in a simple case where the estimated coefficient or odds ratio is computed directly from the binomial PDF you are of course more likely to get a meaningful result. Using maximum likelihood entails assumptions which are not met in such a situation. In fact, you cannot even get results using exact logistic regression via the -exlogistic- command. On the other hand, -exlogistic- estimates the second situation where you have two observations, each with response y, binomial denominator n, and binary predictor x. However, you do not get exact values, but rather median unbiased estimates.
```
y   n    x
--------------
10 100  1
0 100  0

Model the above using -exlogistic-:

. input r n x

r          n          x

1. 10 100 1
2. 0 100 0
3. end

. exlogistic y x, binomial(n) coef estc

Enumerating sample-space combinations:
observation 1:   enumerations =         11
observation 2:   enumerations =        101
observation 3:   enumerations =      10201
note: CMLE estimate for x is +inf; computing MUE
note: CMLE estimate for _cons is -inf; computing MUE
note: .975 quantile estimate for _cons failed to bracket the value

```
Exact logistic regression Number of obs = 200 Binomial variable: n Model score = 10.47368 Pr >= score = 0.0015
```-------------------------------------------------------------------------
--
```
y | Coef. Suff. 2*Pr(Suff.) [95% Conf. Interval]
```-------------+-----------------------------------------------------------
--
```
x | 2.722305* 10 0.0015 .8727845 +Inf _cons | 0* 10 0.0000 -Inf +Inf
```-------------------------------------------------------------------------
--
(*) median unbiased estimates (MUE)

```
I requested estimation of a constant although it is obvious that it is not meaningful in such a situation.
```
```
Compare the above with the clearly mistaken "estimated coefficients" that you provided in your output.
```
. glm r x, fam(bin n)

```
Generalized linear models No. of obs = 2 Optimization : ML Residual df = 0 Scale parameter = 1 Deviance = 2.00000e-08 (1/df) Deviance = . Pearson = 1.00000e-08 (1/df) Pearson = .
```
Variance function: V(u) = u*(1-u/n)                [Binomial]
Link function    : g(u) = ln(u/(n-u))              [Logit]
```
AIC = 4.025974 Log likelihood = -2.025973987 BIC = 2.00e-08 - -------------------------------------------------------------------------
```-----

|                 OIM

```
r | Coef. Std. Err. z P>|z| [95% Conf. Interval]
```
```
- -------------+-----------------------------------------------------------
```-----

```
x | 23.87722 10000 0.00 0.998 -19575.76 19623.52
```
```
_cons | -26.07444 10000 -0.00 0.998 -19625.71 19573.56
```
```
- -------------------------------------------------------------------------
```-----

```
These coefficients indicate a problem with convergence. Exponentiate to obtain an odds ratio:
```
. di %12.0f exp(23.87722)
23428521860

We have an odds ratio here of some 23.4 billion. No surprise.

```
The problem is that the assumptions upon which ML estimation is based are not met here. I tried your examples with several other commercial applications, as well as R, with the same results.
```
The bottom line is that there is nothing wrong with -glm- here.

Joseph Hilbe

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```