Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Factor variable notation vs. hand made dummy vars


From   Ulrich Kohler <kohler@wzb.eu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Factor variable notation vs. hand made dummy vars
Date   Mon, 06 Feb 2012 17:38:27 +0100

Jeff,

thank you very much. The take away message of this then is 

(1) take care that you do not use perfect predictors as reference
category of a categorcal explantory variable in a logit/probit model

(2) as it is cumbersome to search for perfect predictors before deciding
about the reference category it is better to use factor variables
notation. 

Uli


Am Montag, den 06.02.2012, 10:11 -0600 schrieb Jeff Pitblado, StataCorp
LP:
> Ulrich Kohler <kohler@wzb.eu> is comparing results from -logit- between two
> different specifications of what seem be the same model, but is getting
> different results:
> 
> > I cannot replicate the model 
> > 
> > . sysuse auto, clear
> > . tab rep78, gen(d)
> > . logit for mpg d2-d5
> > 
> > with factor variable notation. I tried
> > 
> > . logit for mpg ib1.rep78
> > 
> > but results differ. Can anybody explain why?
> > 
> > (Note as an aside that
> > 
> > . logit for mpg d1-d5
> > 
> > reproduces the factor variables solution, but normally I would not
> > specify the model this way)
> 
> Here is the output form Uli's first model:
> 
> ***** BEGIN:
> . logit for mpg d2-d5
> 
> note: d2 != 0 predicts failure perfectly
>       d2 dropped and 8 obs not used
> 
> Iteration 0:   log likelihood = -39.273156  
> Iteration 1:   log likelihood = -26.016988  
> Iteration 2:   log likelihood = -25.527683  
> Iteration 3:   log likelihood = -25.487362  
> Iteration 4:   log likelihood = -25.480362  
> Iteration 5:   log likelihood = -25.478768  
> Iteration 6:   log likelihood = -25.478391  
> Iteration 7:   log likelihood = -25.478309  
> Iteration 8:   log likelihood = -25.478292  
> Iteration 9:   log likelihood = -25.478288  
> Iteration 10:  log likelihood = -25.478287  
> 
> Logistic regression                               Number of obs   =         61
>                                                   LR chi2(4)      =      27.59
>                                                   Prob > chi2     =     0.0000
> Log likelihood = -25.478287                       Pseudo R2       =     0.3513
> 
> ------------------------------------------------------------------------------
>      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          mpg |   .1310881   .0707293     1.85   0.064    -.0075387    .2697149
>           d2 |          0  (omitted)
>           d3 |   14.28187   2465.084     0.01   0.995    -4817.194    4845.758
>           d4 |   16.29835   2465.084     0.01   0.995    -4815.177    4847.774
>           d5 |   17.41793   2465.084     0.01   0.994    -4814.058    4848.894
>        _cons |  -19.14137   2465.084    -0.01   0.994    -4850.618    4812.335
> ------------------------------------------------------------------------------
> ***** END:
> 
> Technically, this model should not have converged.  The coefficients on the
> binary predictors are way too big; the standard errors don't look reasonable
> either.
> 
> The problem here is that 'd1' and 'd2' are prefect predicts for 'foreign', but
> Uli dropped 'd1' from the list of predictors.  Dropping a level from the
> indicators of a factor variable is normally a natural thing to want to do. One
> of the levels is going to be omitted because of collinearity anyway, so by
> dropping you can control which level to treat as the base level for the fitted
> coefficient effects of the factor variable.  But 'd1' is a perfect predictor,
> so -logit- would have dropped it along with 'd2' (and the observations they
> indicate) for that reason and then found that it still needed to drop one of
> the other 'd#' variables because of collinearity.  However by not including
> 'd1' in the list of predictors, the observations that 'd1' indicates are left
> in the estimation sample, and -logit- is unable to identify that it has a
> collinearity problem.
> 
> We can prevent this by adding 'd1' back in to the list of predictors:
> 
> ***** BEGIN:
> . logit for mpg d1-d5
> 
> note: d1 != 0 predicts failure perfectly
>       d1 dropped and 2 obs not used
> 
> note: d2 != 0 predicts failure perfectly
>       d2 dropped and 8 obs not used
> 
> note: d5 omitted because of collinearity
> Iteration 0:   log likelihood = -38.411464  
> Iteration 1:   log likelihood = -25.814503  
> Iteration 2:   log likelihood = -25.480135  
> Iteration 3:   log likelihood = -25.478287  
> Iteration 4:   log likelihood = -25.478287  
> 
> Logistic regression                               Number of obs   =         59
>                                                   LR chi2(3)      =      25.87
>                                                   Prob > chi2     =     0.0000
> Log likelihood = -25.478287                       Pseudo R2       =     0.3367
> 
> ------------------------------------------------------------------------------
>      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          mpg |   .1310946    .070733     1.85   0.064    -.0075396    .2697287
>           d1 |          0  (omitted)
>           d2 |          0  (omitted)
>           d3 |  -3.136422   1.044601    -3.00   0.003    -5.183803    -1.08904
>           d4 |  -1.119903   .9741478    -1.15   0.250    -3.029198    .7893916
>           d5 |          0  (omitted)
>        _cons |  -1.723275   1.776453    -0.97   0.332    -5.205059    1.758509
> ------------------------------------------------------------------------------
> ***** END:
> 
> Uli already mentioned that this specification reproduces the results from the
> one using factor variables.
> 
> We do not recommend this, but Uli can reproduce the first model specification
> using factor variables notation by explicitly specifying the levels of 'rep78'
> to use:
> 
> ***** BEGIN:
> . logit for mpg i(2/5).rep78
> 
> note: 2.rep78 != 0 predicts failure perfectly
>       2.rep78 dropped and 8 obs not used
> 
> Iteration 0:   log likelihood = -39.273156  
> Iteration 1:   log likelihood = -26.016988  
> Iteration 2:   log likelihood = -25.527683  
> Iteration 3:   log likelihood = -25.487362  
> Iteration 4:   log likelihood = -25.480362  
> Iteration 5:   log likelihood = -25.478768  
> Iteration 6:   log likelihood = -25.478391  
> Iteration 7:   log likelihood = -25.478309  
> Iteration 8:   log likelihood = -25.478292  
> Iteration 9:   log likelihood = -25.478288  
> Iteration 10:  log likelihood = -25.478287  
> 
> Logistic regression                               Number of obs   =         61
>                                                   LR chi2(4)      =      27.59
>                                                   Prob > chi2     =     0.0000
> Log likelihood = -25.478287                       Pseudo R2       =     0.3513
> 
> ------------------------------------------------------------------------------
>      foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          mpg |   .1310881   .0707293     1.85   0.064    -.0075387    .2697149
>              |
>        rep78 |
>           2  |          0  (empty)
>           3  |   14.28187   2465.084     0.01   0.995    -4817.194    4845.758
>           4  |   16.29835   2465.084     0.01   0.995    -4815.177    4847.774
>           5  |   17.41793   2465.084     0.01   0.994    -4814.058    4848.894
>              |
>        _cons |  -19.14137   2465.084    -0.01   0.994    -4850.618    4812.335
> ------------------------------------------------------------------------------
> ***** END:
> 
> --Jeff
> jpitblado@stata.com
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index