Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: perfect prediction in -logit-? |

Date |
Thu, 5 May 2011 15:07:28 -0400 |

Dear listers, what are good ways to diagnose perfect prediction in -logit- and -mlogit- models in batch mode? RTFMing gives the following examples: * Example 1: one-way causation by a dummy variable use http://www.stata-press.com/data/r11/repair, clear logit foreign b3.repair * Example 2: causation by a great linear predictor use http://www.stata-press.com/data/r11/auto, clear drop if foreign==0 & gear_ratio > 3.1 logit foreign mpg weight gear_ratio * Example 3: weird covariate pattern use http://www.stata-press.com/data/r11/logitxmpl, clear logit y x1 x2, iter(50) * Example 4: weak identification in mlogit use http://www.stata-press.com/data/r11/auto, clear mlogit rep78 foreign mpg weight tabulate rep78, generate( rep78d ) logit rep78d1 foreign mpg weight I believe I can find information about these issues in -e(rules)- matrix, but since it is not really documented, even in the fine manuals, I am sort of guessing what its function is by looking at -matrix list e(rules)- after -logit-. For the models that don't have any issues, it is a generic 1x4 matrix with no row/colnames. For models with problems due to a particular variable, it gives the name(s) of the culprit variable(s), the values that lead to perfect prediction, and the number of observations that had to be removed for -logit- to run. The causation by strong predictor in Example 2 is not reflected in -e(rules)-, however; there are no infinite coefficients and standard errors, so the problem is really far into the tails of the distribution of the linear predictor where Stata simply runs out of digits in computing something like 1-c(epsdouble) (which happens when the linear predictor exceeds abs(ln(c(epsdouble))) = 36 in absolute value). The problem with lack of convergence in Example 3 is unfortunately not reflected in -e(rules)-, either, although in this particular case I can also figure out Stata could not estimate all of the coefficients: assert e(rank) == e(k) where the RHS is what Stata wanted to estimate (the number of parameters), and the LHS is what it really could estimate (the rank of the resulting vce). Note that Example 4 is more subtle. -mlogit- did not declare any of the convergence or perfect prediction issues, although I believe it should have. There are only two observations with rep78==1, so I don't really see how Stata (or any other software, apart from WinBUGS that would simply reproduce the prior in this case) could estimate the equation for that outcome. As we see in the logit regression for that cell, -foreign- variable predicts the negative outcome perfectly, but I am still at a loss as to how Stata came up with three coefficients based on just two points. Anyway, back to the -mlogit-: it reports obscenely large standard errors on -foreign- variable (3000 for rep78==1 equation; 1500 for rep78==2 equation), and that would be a numeric accuracy concern to me (for real fun, try this -mlogit- with -basecategory(1)-). However, lacking powerful identification diagnostic tools of -logit-, it does not say anything to raise a brow, neither in the output nor in the -ereturn-ed values. So back to my question. I want to detect issues like lack of convergence in -mlogit-, and figure out if I can point out any of the explanatory variables that I can blame, -logit-style. Is that doable? -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: compare panel line plots** - Next by Date:
**st: mediation with a categorical mediator** - Previous by thread:
**st: compare panel line plots** - Next by thread:
**st: mediation with a categorical mediator** - Index(es):