Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: logistic reg version 8 & 11


From   kmacdonald@stata.com (Kristin MacDonald, StataCorp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: logistic reg version 8 & 11
Date   Fri, 10 Sep 2010 11:21:17 -0500

Ricardo Ovaldia <ovaldia@yahoo.com> reported that he obtained different
results from -xi:logistic- in Stata 8 and Stata 11. 

Maarten Buis <maartenbuis@yahoo.co.uk> and Oliver Jones
<ojones@wiwi.uni-bielefeld.de> also mentioned that they had noticed some
differences and believed that the algorithm that -logistic- is using has
changed.

In Stata 11, the -logistic- command is evaluating the same likelihood as in
previous versions.  The only difference is the path it may take on its way to
the solution.  Prior to Stata 11, -logit- (and hence -logistic-) used Stata's
internal optimizer, which does not employ the telescoped stepping implemented
in -ml-.  In Stata 11, -logit- now uses -ml- to fit logistic regression (but
preserves the original algorithm under version control).

In Ricardo's case, the difference in results is actually evidence of an
underlying problem with using -xi- when the base category for the factor
variable is also a perfect predictor.  In his second email, Ricardo provided
the following example using the auto dataset.  

***** BEGIN:
. sysuse auto, clear
(1978 Automobile Data)

. version 8

. xi:logistic  foreign i.rep78
i.rep78           _Irep78_1-5         (naturally coded; _Irep78_1 omitted)

note: _Irep78_2 != 0 predicts failure perfectly
      _Irep78_2 dropped and 8 obs not used


Logistic regression                               Number of obs   =         61
                                                  LR chi2(3)      =      23.66
                                                  Prob > chi2     =     0.0000
Log likelihood = -27.444671                       Pseudo R2       =     0.3012

------------------------------------------------------------------------------
     foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   _Irep78_3 |   1.03e+07          .        .       .            .           .
   _Irep78_4 |   9.24e+07   7.12e+07    23.83   0.000     2.04e+07    4.18e+08
   _Irep78_5 |   4.16e+08   4.12e+08    20.03   0.000     5.97e+07    2.90e+09
------------------------------------------------------------------------------
Note: 2 failures and 0 successes completely determined.

. version 11

. xi:logistic  foreign i.rep78
i.rep78           _Irep78_1-5         (naturally coded; _Irep78_1 omitted)
note: _Irep78_2 != 0 predicts failure perfectly
      _Irep78_2 dropped and 8 obs not used


Logistic regression                               Number of obs   =         61
                                                  LR chi2(3)      =      23.66
                                                  Prob > chi2     =     0.0000
Log likelihood = -27.444671                       Pseudo R2       =     0.3012

------------------------------------------------------------------------------
     foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   _Irep78_2 |  (omitted)
   _Irep78_3 |   788893.3   1.49e+09     0.01   0.994            0           .
   _Irep78_4 |    7101032   1.34e+10     0.01   0.993            0           .
   _Irep78_5 |   3.20e+07   6.02e+10     0.01   0.993            0           .
------------------------------------------------------------------------------
***** END:

Note that -xi- has, by default, chosen to omit the indicator for rep78==1,
treating this as the base category.  The indicator variable for rep78==2 is
dropped because it predicts failure perfectly.  What is not obvious from this
output is that rep78==1 perfectly predicts failure as well.  We can see from
the following tabulation that the two observations with rep78==1 also have
foreign=0.

***** BEGIN:
. tab rep78 foreign, nolabel

    Repair |
    Record |       Car type
      1978 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
-----------+----------------------+----------
     Total |        48         21 |        69 
***** END:

If rep78==1 were not the base category, this indicator and the two
corresponding observations would have been dropped as well.  Since the
indicator for rep78==1 was already omitted from the model by -xi-, the
-logistic- command is not able to check for perfect prediction.  Therefore,
both Stata 8 and Stata 11 try to estimate the model with the two additional
observations and all three remaining indicators but have trouble as evidenced
by the large odds ratios and missing or large standard errors.  

If we use a base category that is not a perfect predictor, we will see
that Stata 8 and Stata 11 will produce equivalent results.  Here, we use the
-char- command to set the base category to rep78==5.

***** BEGIN:
. char rep78[omit] 5

. version 8

. xi: logistic foreign i.rep78
i.rep78           _Irep78_1-5         (naturally coded; _Irep78_5 omitted)

note: _Irep78_1 != 0 predicts failure perfectly
      _Irep78_1 dropped and 2 obs not used

note: _Irep78_2 != 0 predicts failure perfectly
      _Irep78_2 dropped and 8 obs not used


Logistic regression                               Number of obs   =         59
                                                  LR chi2(2)      =      21.93
                                                  Prob > chi2     =     0.0000
Log likelihood = -27.444671                       Pseudo R2       =     0.2855

------------------------------------------------------------------------------
     foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   _Irep78_3 |   .0246914   .0244617    -3.74   0.000     .0035421    .1721186
   _Irep78_4 |   .2222222   .2028602    -1.65   0.099     .0371322    1.329917
------------------------------------------------------------------------------

. version 11

. xi: logistic foreign i.rep78
i.rep78           _Irep78_1-5         (naturally coded; _Irep78_5 omitted)
note: _Irep78_1 != 0 predicts failure perfectly
      _Irep78_1 dropped and 2 obs not used

note: _Irep78_2 != 0 predicts failure perfectly
      _Irep78_2 dropped and 8 obs not used


Logistic regression                               Number of obs   =         59
                                                  LR chi2(2)      =      21.93
                                                  Prob > chi2     =     0.0000
Log likelihood = -27.444671                       Pseudo R2       =     0.2855

------------------------------------------------------------------------------
     foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   _Irep78_1 |  (omitted)
   _Irep78_2 |  (omitted)
   _Irep78_3 |   .0246914   .0244617    -3.74   0.000     .0035421    .1721188
   _Irep78_4 |   .2222222   .2028602    -1.65   0.099     .0371322    1.329917
------------------------------------------------------------------------------
***** END:

We could have also used -xi, noomit- and allowed -logistic- to drop one of the
indicator variables because of collinearity.  

The good news is that when we use factor variables instead of -xi- in Stata
11, we will not run into this issue.  The -xi- prefix creates new variables
first and then passes these to -logistic-.  Factor variables, on the other
hand, are integrated into the estimation command.  Therefore, -logistic- is
aware that there are 5 categories for rep78 and can check all of them for
perfect prediction, even if you have specified one of these as the base
category.  Here, we try to specify rep78==1 as the base category, but
-logistic- drops the observations associated with this indicator because it
predicts failure perfectly.  Then it recognizes that the three remaining
indicators are perfectly collinear with the constant and omits 5.rep78 as
well. 

***** BEGIN:
. logistic foreign ib1.rep78
note: 1.rep78 != 0 predicts failure perfectly
      1.rep78 dropped and 2 obs not used

note: 2.rep78 != 0 predicts failure perfectly
      2.rep78 dropped and 8 obs not used

note: 5.rep78 omitted because of collinearity

Logistic regression                               Number of obs   =         59
                                                  LR chi2(2)      =      21.93
                                                  Prob > chi2     =     0.0000
Log likelihood = -27.444671                       Pseudo R2       =     0.2855

------------------------------------------------------------------------------
     foreign | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          1  |  (empty)  
          2  |  (empty)  
          3  |   .0246914   .0244617    -3.74   0.000     .0035421    .1721188
          4  |   .2222222   .2028602    -1.65   0.099     .0371322    1.329917
          5  |  (omitted)
------------------------------------------------------------------------------
***** END:

-- Kristin					-- Jeff
kmacdonald@stata.com				jpitblado@stata.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index