Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Factor variable notation vs. hand made dummy vars

From   Richard Williams <>
Subject   Re: st: Factor variable notation vs. hand made dummy vars
Date   Mon, 06 Feb 2012 11:22:38 -0500

At 10:41 AM 2/6/2012, Brendan Halpin wrote:
To put the "why" back one step, the immediate reason is evident from the

| . logit for mpg d2-d5
| note: d2 != 0 predicts failure perfectly
|       d2 dropped and 8 obs not used
| [...]
| . logit for mpg ib1.rep78
| note: 1.rep78 != 0 predicts failure perfectly
|       1.rep78 dropped and 2 obs not used
| note: 2.rep78 != 0 predicts failure perfectly
|       2.rep78 dropped and 8 obs not used
| note: 5.rep78 omitted because of collinearity
| [...]

You end up fitting different models on different data.

The question is now why do the formulations behave differently, and
which is the better default?

To clarify my last answer, my guess is that in the vast majority of cases it won't matter which approach you use. But, this particular example is problematic because of the very small category Ns and the perfect prediction issues. If you are in a situation where it matters, you may want to recode the problematic variable (e.g. combine categories to dichotomize it) or consider an alternative technique, such as -exlogistic-, which, as the manual says, "produces more-accurate inference in small samples because it does not depend on asymptotic results and exlogistic can better deal with one-way causation, such as the case where all females are observed to have a positive outcome."

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index