Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Factor variable notation vs. hand made dummy vars

From   Richard Williams <>
Subject   Re: st: Factor variable notation vs. hand made dummy vars
Date   Mon, 06 Feb 2012 10:53:55 -0500

At 10:41 AM 2/6/2012, Brendan Halpin wrote:
To put the "why" back one step, the immediate reason is evident from the

| . logit for mpg d2-d5
| note: d2 != 0 predicts failure perfectly
|       d2 dropped and 8 obs not used
| [...]
| . logit for mpg ib1.rep78
| note: 1.rep78 != 0 predicts failure perfectly
|       1.rep78 dropped and 2 obs not used
| note: 2.rep78 != 0 predicts failure perfectly
|       2.rep78 dropped and 8 obs not used
| note: 5.rep78 omitted because of collinearity
| [...]

You end up fitting different models on different data.

The question is now why do the formulations behave differently, and
which is the better default?


I would use factor variable notation. -logit- is doing some error checking and some errors are creeping through with the first notation. Note that the seemingly identical

glm for mpg d2-d5, link(logit) family(binomial)
glm for mpg ib1.rep78, link(logit) family(binomial)

doesn't produce any error message, nor does it drop any cases, but the standard errors are monstrous. glm is not doing the same checks logit is. But then again, you probably shouldn't be using rep78 this way in the first place, as category Ns are way too small. If you must use a var like this you probably want to combine some categories.

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index