Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Factor variable notation vs. hand made dummy vars

From   Ulrich Kohler <[email protected]>
To   [email protected]
Subject   Re: st: Factor variable notation vs. hand made dummy vars
Date   Mon, 06 Feb 2012 17:38:27 +0100


thank you very much. The take away message of this then is 

(1) take care that you do not use perfect predictors as reference
category of a categorcal explantory variable in a logit/probit model

(2) as it is cumbersome to search for perfect predictors before deciding
about the reference category it is better to use factor variables


Am Montag, den 06.02.2012, 10:11 -0600 schrieb Jeff Pitblado, StataCorp
> Ulrich Kohler <[email protected]> is comparing results from -logit- between two
> different specifications of what seem be the same model, but is getting
> different results:
> > I cannot replicate the model 
> > 
> > . sysuse auto, clear
> > . tab rep78, gen(d)
> > . logit for mpg d2-d5
> > 
> > with factor variable notation. I tried
> > 
> > . logit for mpg ib1.rep78
> > 
> > but results differ. Can anybody explain why?
> > 
> > (Note as an aside that
> > 
> > . logit for mpg d1-d5
> > 
> > reproduces the factor variables solution, but normally I would not
> > specify the model this way)
> Here is the output form Uli's first model:
> ***** BEGIN:
> . logit for mpg d2-d5
> note: d2 != 0 predicts failure perfectly
>       d2 dropped and 8 obs not used
> Iteration 0:   log likelihood = -39.273156  
> Iteration 1:   log likelihood = -26.016988  
> Iteration 2:   log likelihood = -25.527683  
> Iteration 3:   log likelihood = -25.487362  
> Iteration 4:   log likelihood = -25.480362  
> Iteration 5:   log likelihood = -25.478768  
> Iteration 6:   log likelihood = -25.478391  
> Iteration 7:   log likelihood = -25.478309  
> Iteration 8:   log likelihood = -25.478292  
> Iteration 9:   log likelihood = -25.478288  
> Iteration 10:  log likelihood = -25.478287  
> Logistic regression                               Number of obs   =         61
>                                                   LR chi2(4)      =      27.59
>                                                   Prob > chi2     =     0.0000
> Log likelihood = -25.478287                       Pseudo R2       =     0.3513
> ------------------------------------------------------------------------------
>      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          mpg |   .1310881   .0707293     1.85   0.064    -.0075387    .2697149
>           d2 |          0  (omitted)
>           d3 |   14.28187   2465.084     0.01   0.995    -4817.194    4845.758
>           d4 |   16.29835   2465.084     0.01   0.995    -4815.177    4847.774
>           d5 |   17.41793   2465.084     0.01   0.994    -4814.058    4848.894
>        _cons |  -19.14137   2465.084    -0.01   0.994    -4850.618    4812.335
> ------------------------------------------------------------------------------
> ***** END:
> Technically, this model should not have converged.  The coefficients on the
> binary predictors are way too big; the standard errors don't look reasonable
> either.
> The problem here is that 'd1' and 'd2' are prefect predicts for 'foreign', but
> Uli dropped 'd1' from the list of predictors.  Dropping a level from the
> indicators of a factor variable is normally a natural thing to want to do. One
> of the levels is going to be omitted because of collinearity anyway, so by
> dropping you can control which level to treat as the base level for the fitted
> coefficient effects of the factor variable.  But 'd1' is a perfect predictor,
> so -logit- would have dropped it along with 'd2' (and the observations they
> indicate) for that reason and then found that it still needed to drop one of
> the other 'd#' variables because of collinearity.  However by not including
> 'd1' in the list of predictors, the observations that 'd1' indicates are left
> in the estimation sample, and -logit- is unable to identify that it has a
> collinearity problem.
> We can prevent this by adding 'd1' back in to the list of predictors:
> ***** BEGIN:
> . logit for mpg d1-d5
> note: d1 != 0 predicts failure perfectly
>       d1 dropped and 2 obs not used
> note: d2 != 0 predicts failure perfectly
>       d2 dropped and 8 obs not used
> note: d5 omitted because of collinearity
> Iteration 0:   log likelihood = -38.411464  
> Iteration 1:   log likelihood = -25.814503  
> Iteration 2:   log likelihood = -25.480135  
> Iteration 3:   log likelihood = -25.478287  
> Iteration 4:   log likelihood = -25.478287  
> Logistic regression                               Number of obs   =         59
>                                                   LR chi2(3)      =      25.87
>                                                   Prob > chi2     =     0.0000
> Log likelihood = -25.478287                       Pseudo R2       =     0.3367
> ------------------------------------------------------------------------------
>      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          mpg |   .1310946    .070733     1.85   0.064    -.0075396    .2697287
>           d1 |          0  (omitted)
>           d2 |          0  (omitted)
>           d3 |  -3.136422   1.044601    -3.00   0.003    -5.183803    -1.08904
>           d4 |  -1.119903   .9741478    -1.15   0.250    -3.029198    .7893916
>           d5 |          0  (omitted)
>        _cons |  -1.723275   1.776453    -0.97   0.332    -5.205059    1.758509
> ------------------------------------------------------------------------------
> ***** END:
> Uli already mentioned that this specification reproduces the results from the
> one using factor variables.
> We do not recommend this, but Uli can reproduce the first model specification
> using factor variables notation by explicitly specifying the levels of 'rep78'
> to use:
> ***** BEGIN:
> . logit for mpg i(2/5).rep78
> note: 2.rep78 != 0 predicts failure perfectly
>       2.rep78 dropped and 8 obs not used
> Iteration 0:   log likelihood = -39.273156  
> Iteration 1:   log likelihood = -26.016988  
> Iteration 2:   log likelihood = -25.527683  
> Iteration 3:   log likelihood = -25.487362  
> Iteration 4:   log likelihood = -25.480362  
> Iteration 5:   log likelihood = -25.478768  
> Iteration 6:   log likelihood = -25.478391  
> Iteration 7:   log likelihood = -25.478309  
> Iteration 8:   log likelihood = -25.478292  
> Iteration 9:   log likelihood = -25.478288  
> Iteration 10:  log likelihood = -25.478287  
> Logistic regression                               Number of obs   =         61
>                                                   LR chi2(4)      =      27.59
>                                                   Prob > chi2     =     0.0000
> Log likelihood = -25.478287                       Pseudo R2       =     0.3513
> ------------------------------------------------------------------------------
>      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          mpg |   .1310881   .0707293     1.85   0.064    -.0075387    .2697149
>              |
>        rep78 |
>           2  |          0  (empty)
>           3  |   14.28187   2465.084     0.01   0.995    -4817.194    4845.758
>           4  |   16.29835   2465.084     0.01   0.995    -4815.177    4847.774
>           5  |   17.41793   2465.084     0.01   0.994    -4814.058    4848.894
>              |
>        _cons |  -19.14137   2465.084    -0.01   0.994    -4850.618    4812.335
> ------------------------------------------------------------------------------
> ***** END:
> --Jeff
> [email protected]
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index