Perhaps I'm not making myself clear. There are two issues in my original post.
1. Why is it that glm refuses to calculate a binomial proportion (when r != 0 & r != n)?
2. Why doesn't glm give an error message and give up (as would logit) in a case where the coefficients are clearly nonestimable by ML?
Both problems are trivial  there are easy ways to work around it. I was just hoping that if there weren't any theoretical reason for glm to behave this way, that a future update may make glm behaves more like logit in these situations.
Yours,
Tim
Original Message
From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
Sent: 05 June 2009 17:09
To: [email protected]
Subject: st: RE: curious behavior of glm
Regarding the estimation of 1) a single observation logistic model, and
2) a two observation logistic model, having the binomial form with a y
being the binomial numerator and n the denominator:
When you use cii, or engage in a simple case where the estimated
coefficient or odds ratio is computed
directly from the binomial PDF you are of course more likely to get a
meaningful result. Using maximum likelihood entails assumptions which
are not met in such a situation. In fact, you cannot even get results
using exact logistic regression via the exlogistic command. On the
other hand, exlogistic estimates the second situation where you have
two observations, each with response y, binomial denominator n, and
binary predictor x. However, you do not get exact values, but rather
median unbiased estimates.
y n x

10 100 1
0 100 0
Model the above using exlogistic:
. input r n x
r n x
1. 10 100 1
2. 0 100 0
3. end
. exlogistic y x, binomial(n) coef estc
Enumerating samplespace combinations:
observation 1: enumerations = 11
observation 2: enumerations = 101
observation 3: enumerations = 10201
note: CMLE estimate for x is +inf; computing MUE
note: CMLE estimate for _cons is inf; computing MUE
note: .975 quantile estimate for _cons failed to bracket the value
Exact logistic regression Number of obs =
200
Binomial variable: n Model score =
10.47368
Pr >= score =
0.0015


y  Coef. Suff. 2*Pr(Suff.) [95% Conf.
Interval]
+

x  2.722305* 10 0.0015 .8727845
+Inf
_cons  0* 10 0.0000 Inf
+Inf


(*) median unbiased estimates (MUE)
I requested estimation of a constant although it is obvious that it is
not meaningful in such a situation.
Compare the above with the clearly mistaken "estimated coefficients"
that you provided in your output.
. glm r x, fam(bin n)
Generalized linear models No. of obs =
2
Optimization : ML Residual df =
0
Scale parameter =
1
Deviance = 2.00000e08 (1/df) Deviance =
.
Pearson = 1.00000e08 (1/df) Pearson =
.
Variance function: V(u) = u*(1u/n) [Binomial]
Link function : g(u) = ln(u/(nu)) [Logit]
AIC =
4.025974
Log likelihood = 2.025973987 BIC =
2.00e08



 OIM
r  Coef. Std. Err. z P>z [95% Conf.
Interval]

+

x  23.87722 10000 0.00 0.998 19575.76
19623.52
_cons  26.07444 10000 0.00 0.998 19625.71
19573.56



These coefficients indicate a problem with convergence. Exponentiate to
obtain an odds ratio:
. di %12.0f exp(23.87722)
23428521860
We have an odds ratio here of some 23.4 billion. No surprise.
The problem is that the assumptions upon which ML estimation is based
are not met here. I tried
your examples with several other commercial applications, as well as R,
with the same results.
The bottom line is that there is nothing wrong with glm here.
Joseph Hilbe
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/