Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Factor variable notation vs. hand made dummy vars

 From jpitblado@stata.com (Jeff Pitblado, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: Factor variable notation vs. hand made dummy vars Date Mon, 06 Feb 2012 10:11:58 -0600

Ulrich Kohler <kohler@wzb.eu> is comparing results from -logit- between two
different specifications of what seem be the same model, but is getting
different results:

> I cannot replicate the model
>
> . sysuse auto, clear
> . tab rep78, gen(d)
> . logit for mpg d2-d5
>
> with factor variable notation. I tried
>
> . logit for mpg ib1.rep78
>
> but results differ. Can anybody explain why?
>
> (Note as an aside that
>
> . logit for mpg d1-d5
>
> reproduces the factor variables solution, but normally I would not
> specify the model this way)

Here is the output form Uli's first model:

***** BEGIN:
. logit for mpg d2-d5

note: d2 != 0 predicts failure perfectly
d2 dropped and 8 obs not used

Iteration 0:   log likelihood = -39.273156
Iteration 1:   log likelihood = -26.016988
Iteration 2:   log likelihood = -25.527683
Iteration 3:   log likelihood = -25.487362
Iteration 4:   log likelihood = -25.480362
Iteration 5:   log likelihood = -25.478768
Iteration 6:   log likelihood = -25.478391
Iteration 7:   log likelihood = -25.478309
Iteration 8:   log likelihood = -25.478292
Iteration 9:   log likelihood = -25.478288
Iteration 10:  log likelihood = -25.478287

Logistic regression                               Number of obs   =         61
LR chi2(4)      =      27.59
Prob > chi2     =     0.0000
Log likelihood = -25.478287                       Pseudo R2       =     0.3513

------------------------------------------------------------------------------
foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |   .1310881   .0707293     1.85   0.064    -.0075387    .2697149
d2 |          0  (omitted)
d3 |   14.28187   2465.084     0.01   0.995    -4817.194    4845.758
d4 |   16.29835   2465.084     0.01   0.995    -4815.177    4847.774
d5 |   17.41793   2465.084     0.01   0.994    -4814.058    4848.894
_cons |  -19.14137   2465.084    -0.01   0.994    -4850.618    4812.335
------------------------------------------------------------------------------
***** END:

Technically, this model should not have converged.  The coefficients on the
binary predictors are way too big; the standard errors don't look reasonable
either.

The problem here is that 'd1' and 'd2' are prefect predicts for 'foreign', but
Uli dropped 'd1' from the list of predictors.  Dropping a level from the
indicators of a factor variable is normally a natural thing to want to do. One
of the levels is going to be omitted because of collinearity anyway, so by
dropping you can control which level to treat as the base level for the fitted
coefficient effects of the factor variable.  But 'd1' is a perfect predictor,
so -logit- would have dropped it along with 'd2' (and the observations they
indicate) for that reason and then found that it still needed to drop one of
the other 'd#' variables because of collinearity.  However by not including
'd1' in the list of predictors, the observations that 'd1' indicates are left
in the estimation sample, and -logit- is unable to identify that it has a
collinearity problem.

We can prevent this by adding 'd1' back in to the list of predictors:

***** BEGIN:
. logit for mpg d1-d5

note: d1 != 0 predicts failure perfectly
d1 dropped and 2 obs not used

note: d2 != 0 predicts failure perfectly
d2 dropped and 8 obs not used

note: d5 omitted because of collinearity
Iteration 0:   log likelihood = -38.411464
Iteration 1:   log likelihood = -25.814503
Iteration 2:   log likelihood = -25.480135
Iteration 3:   log likelihood = -25.478287
Iteration 4:   log likelihood = -25.478287

Logistic regression                               Number of obs   =         59
LR chi2(3)      =      25.87
Prob > chi2     =     0.0000
Log likelihood = -25.478287                       Pseudo R2       =     0.3367

------------------------------------------------------------------------------
foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |   .1310946    .070733     1.85   0.064    -.0075396    .2697287
d1 |          0  (omitted)
d2 |          0  (omitted)
d3 |  -3.136422   1.044601    -3.00   0.003    -5.183803    -1.08904
d4 |  -1.119903   .9741478    -1.15   0.250    -3.029198    .7893916
d5 |          0  (omitted)
_cons |  -1.723275   1.776453    -0.97   0.332    -5.205059    1.758509
------------------------------------------------------------------------------
***** END:

Uli already mentioned that this specification reproduces the results from the
one using factor variables.

We do not recommend this, but Uli can reproduce the first model specification
using factor variables notation by explicitly specifying the levels of 'rep78'
to use:

***** BEGIN:
. logit for mpg i(2/5).rep78

note: 2.rep78 != 0 predicts failure perfectly
2.rep78 dropped and 8 obs not used

Iteration 0:   log likelihood = -39.273156
Iteration 1:   log likelihood = -26.016988
Iteration 2:   log likelihood = -25.527683
Iteration 3:   log likelihood = -25.487362
Iteration 4:   log likelihood = -25.480362
Iteration 5:   log likelihood = -25.478768
Iteration 6:   log likelihood = -25.478391
Iteration 7:   log likelihood = -25.478309
Iteration 8:   log likelihood = -25.478292
Iteration 9:   log likelihood = -25.478288
Iteration 10:  log likelihood = -25.478287

Logistic regression                               Number of obs   =         61
LR chi2(4)      =      27.59
Prob > chi2     =     0.0000
Log likelihood = -25.478287                       Pseudo R2       =     0.3513

------------------------------------------------------------------------------
foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |   .1310881   .0707293     1.85   0.064    -.0075387    .2697149
|
rep78 |
2  |          0  (empty)
3  |   14.28187   2465.084     0.01   0.995    -4817.194    4845.758
4  |   16.29835   2465.084     0.01   0.995    -4815.177    4847.774
5  |   17.41793   2465.084     0.01   0.994    -4814.058    4848.894
|
_cons |  -19.14137   2465.084    -0.01   0.994    -4850.618    4812.335
------------------------------------------------------------------------------
***** END:

--Jeff