Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Understanding Factor variables - is order significant ?


From   "Michael N. Mitchell" <[email protected]>
To   [email protected]
Subject   Re: st: Understanding Factor variables - is order significant ?
Date   Tue, 25 May 2010 19:19:07 -0700

Dear Richard

  I think I need to use my glasses!

  Yes, Richard, you are exactly on target. It relates to the use of -#- instead of -##- .

My previous answer is still true, in the sense that when you do a#b and b#a, that you get a different reference "cell", and thus it is re-scrambling the coding.

However, doing a##b and b##a will be the same, as shown in the example below using the auto dataset with a simple regression...

. sysuse auto
(1978 Automobile Data)
. generate bigtrunk = trunk > 15
. generate biglen   = length > 190

. regress mpg bigtrunk##biglen

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  3,    70) =   23.21
       Model |  1218.59972     3  406.199906           Prob > F      =  0.0000
    Residual |  1224.85974    70  17.4979963           R-squared     =  0.4987
-------------+------------------------------           Adj R-squared =  0.4772
       Total |  2443.45946    73  33.4720474           Root MSE      =  4.1831

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  1.bigtrunk |  -1.939394    2.52248    -0.77   0.445    -6.970323    3.091535
    1.biglen |  -7.806061   1.509981    -5.17   0.000    -10.81762   -4.794499
             |
    bigtrunk#|
      biglen |
        1 1  |    1.35368   2.955949     0.46   0.648    -4.541775    7.249134
             |
       _cons |   25.60606   .7281774    35.16   0.000     24.15376    27.05836
------------------------------------------------------------------------------

.
. regress mpg biglen##bigtrunk

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  3,    70) =   23.21
       Model |  1218.59972     3  406.199906           Prob > F      =  0.0000
    Residual |  1224.85974    70  17.4979963           R-squared     =  0.4987
-------------+------------------------------           Adj R-squared =  0.4772
       Total |  2443.45946    73  33.4720474           Root MSE      =  4.1831

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    1.biglen |  -7.806061   1.509981    -5.17   0.000    -10.81762   -4.794499
  1.bigtrunk |  -1.939394    2.52248    -0.77   0.445    -6.970323    3.091535
             |
      biglen#|
    bigtrunk |
        1 1  |    1.35368   2.955949     0.46   0.648    -4.541775    7.249134
             |
       _cons |   25.60606   .7281774    35.16   0.000     24.15376    27.05836
------------------------------------------------------------------------------

I hope that helps,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-05-25 8.06 PM, Richard Williams wrote:
At 08:32 PM 5/25/2010, Michael N. Mitchell wrote:
Extend that idea to your interaction... Suppose you flip the coding of
your "ra" and "dm" variables. Note that the test of the interaction,
the p value, will remain the same (assuming both are dummy variables).
The coefficients of "ra" and "dm" will change as well, due to the
change in coding. The details get more complicated, but are explained
in section 3.5 of
http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter3/statareg3.htm
. It is explained using the old "xi" terminology, but the issues still
are the same.

He is not changing the coding though. He is just flipping the placement
of the terms, i.e. b1.ra#b0.dm in one model and b0.dm#b1.ra. Like using
female * race versus using race * female.

I'd be curious to know if the two models did produce identical fits.
That would indicate whether the parameterizations are equivalent. If
not, then something is getting screwed up.

I suspect using ## instead of # might solve the problem -- and that
would be my preference anyway.

The following code also produces inconsistent results, with the 3rd
model being wrong. It isn't clear to me why that is the case.

use "http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta";, clear
ologit warm yr89#male, nolog
ologit warm b0.male#b1.yr89, nolog
ologit warm b1.yr89#b0.male, nolog

I hate to accuse Stata of having a bug, but I am starting to wonder...

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index