Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Understanding Factor variables - is order significant ?


From   "Michael N. Mitchell" <Michael.Norman.Mitchell@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Understanding Factor variables - is order significant ?
Date   Tue, 25 May 2010 20:50:35 -0700

Dear Richard

I now see what you are talking about! I am confused by this as well. So, I switched to a more sensible dataset for poisson regression, using the UCLA ATS example, as shown below...

. use http://www.ats.ucla.edu/stat/stata/dae/poissonreg, clear
(Two Los Angeles High Schools)
. gen himath = math > 50

Using the term -ib1.himath#ib0.male-, it is as though four groups are entered, with the group labeled himath=1 male=0 as the reference group.

. regress daysabs ib1.himath#ib0.male

      Source |       SS       df       MS              Number of obs =     316
-------------+------------------------------           F(  3,   312) =    7.07
       Model |    1112.497     3  370.832335           Prob > F      =  0.0001
    Residual |  16366.1106   312  52.4554827           R-squared     =  0.0636
-------------+------------------------------           Adj R-squared =  0.0546
       Total |  17478.6076   315  55.4876432           Root MSE      =  7.2426

------------------------------------------------------------------------------
     daysabs |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 himath#male |
        0 0  |   3.062032   1.139457     2.69   0.008     .8200395    5.304025
        0 1  |   1.491369   1.159842     1.29   0.199    -.7907317     3.77347
        1 1  |  -2.010909   1.175009    -1.71   0.088    -4.322853    .3010348
             |
       _cons |   5.090909   .8253727     6.17   0.000     3.466909    6.714909
------------------------------------------------------------------------------


So here I reproduce the results, explicitly entering the four groups (and omitting group 3 via ib3.group) and we can see the results are the same...

. generate group = .
(316 missing values generated)

. replace group = 1 if himath==0 & male==0
(85 real changes made)

. replace group = 2 if himath==0 & male==1
(79 real changes made)

. replace group = 3 if himath==1 & male==0
(77 real changes made)

. replace group = 4 if himath==1 & male==1
(75 real changes made)

. regress daysabs ib3.group

      Source |       SS       df       MS              Number of obs =     316
-------------+------------------------------           F(  3,   312) =    7.07
       Model |    1112.497     3  370.832335           Prob > F      =  0.0001
    Residual |  16366.1106   312  52.4554827           R-squared     =  0.0636
-------------+------------------------------           Adj R-squared =  0.0546
       Total |  17478.6076   315  55.4876432           Root MSE      =  7.2426

------------------------------------------------------------------------------
     daysabs |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          1  |   3.062032   1.139457     2.69   0.008     .8200395    5.304025
          2  |   1.491369   1.159842     1.29   0.199    -.7907317     3.77347
          4  |  -2.010909   1.175009    -1.71   0.088    -4.322853    .3010348
             |
       _cons |   5.090909   .8253727     6.17   0.000     3.466909    6.714909
------------------------------------------------------------------------------


So, as you suggest, let's try this using a -poisson- model. So, here is the result using -ib1.himath#ib0.male- . The coding still leaves the group labeled himath=1 male=0 as the reference group. But, the results include a coefficient for himath=0 and male=0 that has no standard error. Does this occur when using -group-???

. poisson daysabs ib1.himath#ib0.male

Iteration 0:   log likelihood = -1600.2092
Iteration 1:   log likelihood = -1564.0399
Iteration 2:   log likelihood = -1563.1148
Iteration 3:   log likelihood = -1563.1144
Iteration 4:   log likelihood = -1563.1144

Poisson regression                                Number of obs   =        316
                                                  LR chi2(2)      =     144.99
                                                  Prob > chi2     =     0.0000
Log likelihood = -1563.1144                       Pseudo R2       =     0.0443

------------------------------------------------------------------------------
     daysabs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 himath#male |
        0 0  |   .4701202          .        .       .            .           .
        0 1  |   -.017358   .0533361    -0.33   0.745    -.1218947    .0871788
        1 1  |  -.7768093   .0724615   -10.72   0.000    -.9188312   -.6347875
             |
       _cons |   1.901739   .0303588    62.64   0.000     1.842237    1.961241
------------------------------------------------------------------------------

Here are the results now for the poisson model, explicitly entering the four groups (and omitting group 3 via ib3.group) and we can see the results are very different from above.

. poisson daysabs ib3.group

Iteration 0:   log likelihood = -1534.3667
Iteration 1:   log likelihood = -1534.3618
Iteration 2:   log likelihood = -1534.3618

Poisson regression                                Number of obs   =        316
                                                  LR chi2(3)      =     202.49
                                                  Prob > chi2     =     0.0000
Log likelihood = -1534.3618                       Pseudo R2       =     0.0619

------------------------------------------------------------------------------
     daysabs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          1  |   .4709223   .0631983     7.45   0.000      .347056    .5947887
          2  |   .2569245   .0668887     3.84   0.000     .1258251     .388024
          4  |  -.5025268   .0829459    -6.06   0.000    -.6650978   -.3399558
             |
       _cons |   1.627456   .0505076    32.22   0.000     1.528463     1.72645
------------------------------------------------------------------------------

  This is a perplexing state of affairs! I don't know how to explain this!

I hope someone can help explain!

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-05-25 8.43 PM, Richard Williams wrote:
At 09:19 PM 5/25/2010, Michael N. Mitchell wrote:
Dear Richard

I think I need to use my glasses!

Yes, Richard, you are exactly on target. It relates to the use of -#-
instead of -##- .

My previous answer is still true, in the sense that when you do a#b
and b#a, that you get a different reference "cell", and thus it is
re-scrambling the coding.

I agree with everything you say. But, it still isn't clear to me why it
should make a difference whether you use b1.ra#b0.dm versus b0.dm#b1.ra.
Tweaking your example, the following yield identical or equivalent
results for regress but not for poisson:

sysuse auto, clear
generate bigtrunk = trunk > 15
generate biglen = length > 190
regress mpg bigtrunk#biglen
regress mpg b1.bigtrunk#b0.biglen
regress mpg b0.biglen#b1.bigtrunk
poisson mpg bigtrunk#biglen, nolog
poisson mpg b1.bigtrunk#b0.biglen, nolog
poisson mpg b0.biglen#b1.bigtrunk, nolog

Use of ## in the last 2 commands avoids the problem, but why is there a
problem in the first place?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW: http://www.nd.edu/~rwilliam

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index