Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: curious behavior of glm


From   "Mak, Timothy" <timothy.mak07@imperial.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: curious behavior of glm
Date   Fri, 5 Jun 2009 17:37:31 +0100

Perhaps I'm not making myself clear. There are two issues in my original post. 

1. Why is it that -glm- refuses to calculate a binomial proportion (when r != 0 & r != n)? 
2. Why doesn't -glm- give an error message and give up (as would -logit-) in a case where the coefficients are clearly non-estimable by ML? 

Both problems are trivial - there are easy ways to work around it. I was just hoping that if there weren't any theoretical reason for -glm- to behave this way, that a future update may make -glm- behaves more like -logit- in these situations. 

Yours, 

Tim

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of jhilbe@aol.com
Sent: 05 June 2009 17:09
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: curious behavior of glm

Regarding the estimation of 1) a single observation logistic model, and 
2) a two observation logistic model, having the binomial form with a y 
being the binomial numerator and n the denominator:

When you use cii, or engage in a simple case where the estimated 
coefficient or odds ratio is computed
directly from the binomial PDF you are of course more likely to get a 
meaningful result. Using maximum likelihood entails assumptions which 
are not met in such a situation. In fact, you cannot even get results 
using exact logistic regression via the -exlogistic- command. On the 
other hand, -exlogistic- estimates the second situation where you have 
two observations, each with response y, binomial denominator n, and 
binary predictor x. However, you do not get exact values, but rather 
median unbiased estimates.

 y   n    x
--------------
10 100  1
  0 100  0

Model the above using -exlogistic-:


. input r n x

              r          n          x

  1. 10 100 1
  2. 0 100 0
  3. end


. exlogistic y x, binomial(n) coef estc

Enumerating sample-space combinations:
observation 1:   enumerations =         11
observation 2:   enumerations =        101
observation 3:   enumerations =      10201
note: CMLE estimate for x is +inf; computing MUE
note: CMLE estimate for _cons is -inf; computing MUE
note: .975 quantile estimate for _cons failed to bracket the value

Exact logistic regression                        Number of obs =       
200
Binomial variable: n                             Model score   =  
10.47368
                                                  Pr >= score   =    
0.0015
-------------------------------------------------------------------------
--
            y |      Coef.       Suff.  2*Pr(Suff.)     [95% Conf. 
Interval]
-------------+-----------------------------------------------------------
--
            x |   2.722305*         10      0.0015      .8727845       
+Inf
        _cons |          0*         10      0.0000         -Inf        
+Inf
-------------------------------------------------------------------------
--
(*) median unbiased estimates (MUE)

I requested estimation of a constant although it is obvious that it is 
not meaningful in such a situation.

Compare the above with the clearly mistaken "estimated coefficients" 
that you provided in your output.

. glm r x, fam(bin n)

Generalized linear models                          No. of obs      =    
     2
Optimization     : ML                              Residual df     =    
     0
                                                    Scale parameter =    
     1
Deviance         =  2.00000e-08                    (1/df) Deviance =    
     .
Pearson          =  1.00000e-08                    (1/df) Pearson  =    
     .

 Variance function: V(u) = u*(1-u/n)                [Binomial]
Link function    : g(u) = ln(u/(n-u))              [Logit]
                                                    AIC             =  
4.025974
Log likelihood   = -2.025973987                    BIC             =  
2.00e-08
- 
-------------------------------------------------------------------------
-----

             |                 OIM

            r |      Coef.   Std. Err.      z    P>|z|     [95% Conf. 
Interval]

- 
-------------+-----------------------------------------------------------
-----

            x |   23.87722      10000     0.00   0.998    -19575.76    
19623.52

        _cons |  -26.07444      10000    -0.00   0.998    -19625.71    
19573.56

- 
-------------------------------------------------------------------------
-----


These coefficients indicate a problem with convergence. Exponentiate to 
obtain an odds ratio:

. di %12.0f exp(23.87722)
 23428521860

We have an odds ratio here of some 23.4 billion. No surprise.

The problem is that the assumptions upon which ML estimation is based 
are not met here. I tried
your examples with several other commercial applications, as well as R, 
with the same results.

The bottom line is that there is nothing wrong with -glm- here.

Joseph Hilbe






*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index