Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
kmacdonald@stata.com (Kristin MacDonald, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: logistic reg version 8 & 11 |

Date |
Fri, 10 Sep 2010 11:21:17 -0500 |

Ricardo Ovaldia <ovaldia@yahoo.com> reported that he obtained different results from -xi:logistic- in Stata 8 and Stata 11. Maarten Buis <maartenbuis@yahoo.co.uk> and Oliver Jones <ojones@wiwi.uni-bielefeld.de> also mentioned that they had noticed some differences and believed that the algorithm that -logistic- is using has changed. In Stata 11, the -logistic- command is evaluating the same likelihood as in previous versions. The only difference is the path it may take on its way to the solution. Prior to Stata 11, -logit- (and hence -logistic-) used Stata's internal optimizer, which does not employ the telescoped stepping implemented in -ml-. In Stata 11, -logit- now uses -ml- to fit logistic regression (but preserves the original algorithm under version control). In Ricardo's case, the difference in results is actually evidence of an underlying problem with using -xi- when the base category for the factor variable is also a perfect predictor. In his second email, Ricardo provided the following example using the auto dataset. ***** BEGIN: . sysuse auto, clear (1978 Automobile Data) . version 8 . xi:logistic foreign i.rep78 i.rep78 _Irep78_1-5 (naturally coded; _Irep78_1 omitted) note: _Irep78_2 != 0 predicts failure perfectly _Irep78_2 dropped and 8 obs not used Logistic regression Number of obs = 61 LR chi2(3) = 23.66 Prob > chi2 = 0.0000 Log likelihood = -27.444671 Pseudo R2 = 0.3012 ------------------------------------------------------------------------------ foreign | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Irep78_3 | 1.03e+07 . . . . . _Irep78_4 | 9.24e+07 7.12e+07 23.83 0.000 2.04e+07 4.18e+08 _Irep78_5 | 4.16e+08 4.12e+08 20.03 0.000 5.97e+07 2.90e+09 ------------------------------------------------------------------------------ Note: 2 failures and 0 successes completely determined. . version 11 . xi:logistic foreign i.rep78 i.rep78 _Irep78_1-5 (naturally coded; _Irep78_1 omitted) note: _Irep78_2 != 0 predicts failure perfectly _Irep78_2 dropped and 8 obs not used Logistic regression Number of obs = 61 LR chi2(3) = 23.66 Prob > chi2 = 0.0000 Log likelihood = -27.444671 Pseudo R2 = 0.3012 ------------------------------------------------------------------------------ foreign | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Irep78_2 | (omitted) _Irep78_3 | 788893.3 1.49e+09 0.01 0.994 0 . _Irep78_4 | 7101032 1.34e+10 0.01 0.993 0 . _Irep78_5 | 3.20e+07 6.02e+10 0.01 0.993 0 . ------------------------------------------------------------------------------ ***** END: Note that -xi- has, by default, chosen to omit the indicator for rep78==1, treating this as the base category. The indicator variable for rep78==2 is dropped because it predicts failure perfectly. What is not obvious from this output is that rep78==1 perfectly predicts failure as well. We can see from the following tabulation that the two observations with rep78==1 also have foreign=0. ***** BEGIN: . tab rep78 foreign, nolabel Repair | Record | Car type 1978 | 0 1 | Total -----------+----------------------+---------- 1 | 2 0 | 2 2 | 8 0 | 8 3 | 27 3 | 30 4 | 9 9 | 18 5 | 2 9 | 11 -----------+----------------------+---------- Total | 48 21 | 69 ***** END: If rep78==1 were not the base category, this indicator and the two corresponding observations would have been dropped as well. Since the indicator for rep78==1 was already omitted from the model by -xi-, the -logistic- command is not able to check for perfect prediction. Therefore, both Stata 8 and Stata 11 try to estimate the model with the two additional observations and all three remaining indicators but have trouble as evidenced by the large odds ratios and missing or large standard errors. If we use a base category that is not a perfect predictor, we will see that Stata 8 and Stata 11 will produce equivalent results. Here, we use the -char- command to set the base category to rep78==5. ***** BEGIN: . char rep78[omit] 5 . version 8 . xi: logistic foreign i.rep78 i.rep78 _Irep78_1-5 (naturally coded; _Irep78_5 omitted) note: _Irep78_1 != 0 predicts failure perfectly _Irep78_1 dropped and 2 obs not used note: _Irep78_2 != 0 predicts failure perfectly _Irep78_2 dropped and 8 obs not used Logistic regression Number of obs = 59 LR chi2(2) = 21.93 Prob > chi2 = 0.0000 Log likelihood = -27.444671 Pseudo R2 = 0.2855 ------------------------------------------------------------------------------ foreign | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Irep78_3 | .0246914 .0244617 -3.74 0.000 .0035421 .1721186 _Irep78_4 | .2222222 .2028602 -1.65 0.099 .0371322 1.329917 ------------------------------------------------------------------------------ . version 11 . xi: logistic foreign i.rep78 i.rep78 _Irep78_1-5 (naturally coded; _Irep78_5 omitted) note: _Irep78_1 != 0 predicts failure perfectly _Irep78_1 dropped and 2 obs not used note: _Irep78_2 != 0 predicts failure perfectly _Irep78_2 dropped and 8 obs not used Logistic regression Number of obs = 59 LR chi2(2) = 21.93 Prob > chi2 = 0.0000 Log likelihood = -27.444671 Pseudo R2 = 0.2855 ------------------------------------------------------------------------------ foreign | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Irep78_1 | (omitted) _Irep78_2 | (omitted) _Irep78_3 | .0246914 .0244617 -3.74 0.000 .0035421 .1721188 _Irep78_4 | .2222222 .2028602 -1.65 0.099 .0371322 1.329917 ------------------------------------------------------------------------------ ***** END: We could have also used -xi, noomit- and allowed -logistic- to drop one of the indicator variables because of collinearity. The good news is that when we use factor variables instead of -xi- in Stata 11, we will not run into this issue. The -xi- prefix creates new variables first and then passes these to -logistic-. Factor variables, on the other hand, are integrated into the estimation command. Therefore, -logistic- is aware that there are 5 categories for rep78 and can check all of them for perfect prediction, even if you have specified one of these as the base category. Here, we try to specify rep78==1 as the base category, but -logistic- drops the observations associated with this indicator because it predicts failure perfectly. Then it recognizes that the three remaining indicators are perfectly collinear with the constant and omits 5.rep78 as well. ***** BEGIN: . logistic foreign ib1.rep78 note: 1.rep78 != 0 predicts failure perfectly 1.rep78 dropped and 2 obs not used note: 2.rep78 != 0 predicts failure perfectly 2.rep78 dropped and 8 obs not used note: 5.rep78 omitted because of collinearity Logistic regression Number of obs = 59 LR chi2(2) = 21.93 Prob > chi2 = 0.0000 Log likelihood = -27.444671 Pseudo R2 = 0.2855 ------------------------------------------------------------------------------ foreign | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- rep78 | 1 | (empty) 2 | (empty) 3 | .0246914 .0244617 -3.74 0.000 .0035421 .1721188 4 | .2222222 .2028602 -1.65 0.099 .0371322 1.329917 5 | (omitted) ------------------------------------------------------------------------------ ***** END: -- Kristin -- Jeff kmacdonald@stata.com jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: logistic reg version 8 & 11***From:*Maarten buis <maartenbuis@yahoo.co.uk>

- Prev by Date:
**RE: st: RE: combining/integrating the results of -stepwise- and -mim- for variable selection after multiple imputation** - Next by Date:
**st: patterned matrix creation** - Previous by thread:
**st: logistic reg version 8 & 11** - Next by thread:
**Re: st: logistic reg version 8 & 11** - Index(es):