Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Understanding Factor variables - is order significant ?


From   "Michael N. Mitchell" <Michael.Norman.Mitchell@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Understanding Factor variables - is order significant ?
Date   Tue, 25 May 2010 18:32:00 -0700

Dear Jesper

This is not really a "Stata" issue, but is an issue regarding the coding of dummy variables. Let's take an even simpler case, suppose you have a variable named "female" that is coded 1 if a someone is a female, and 0 if they are a male. But, instead, you change your mind and include the coefficient for a variable named "male" that is coded 1 if you are a male, and 0 otherwise. The coefficient will change (in the case of a linear model, it will be of opposite sign) but note the p value will remain the same as it still remains a test of the difference between males and females.

Extend that idea to your interaction... Suppose you flip the coding of your "ra" and "dm" variables. Note that the test of the interaction, the p value, will remain the same (assuming both are dummy variables). The coefficients of "ra" and "dm" will change as well, due to the change in coding. The details get more complicated, but are explained in section 3.5 of http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter3/statareg3.htm . It is explained using the old "xi" terminology, but the issues still are the same.

  I hope that helps,

Best luck,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-05-25 3.22 PM, Jesper Lindhardsen wrote:
Dear Statalisters,

I am having a hard time understanding why 2 regression models that
differ only by the "order" of the included factor variables yield
different results???
I can't (or am too slow to) find the answer in the documentation, but I
think it is related to the parsing of the baselevel specifiers (see
model 1 legend = _b[0o.ra#0b.dm] ???).

Here are the 2 commands and resulting output - as you can see I've only
changed b1.ra#b0.dm to b0.dm#b1.ra. Output has been edited, but only
left out if identical between models.

(System: Stata 11/MP for windows, born 10 feb 2010)

1)
poisson _d b1.ra#b0.dm i.alder_k sex if ex==0, e(risk_tid) irr
coeflegend
			
_d          IRR       Legend
			
ra#dm
0 0     1.487748  _b[0o.ra#0b.dm]
0 1     1.968017  _b[0.ra#1.dm]
1 1     2.787839  _b[1b.ra#1.dm]

alder_k
1     6.176815  _b[1.alder_k]
2     18.09798  _b[2.alder_k]

sex    2.070646  _b[sex]
risk_tid  (exposure)
			
2)

poisson _d b0.dm#b1.ra i.alder_k sex  if ex==0, e(risk_tid) irr
coeflegend

			
_d         IRR         Legend
			
dm#ra
0 0     .5935912  _b[0b.dm#0.ra]
1 0     1.169963  _b[1.dm#0.ra]
1 1      1.65762  _b[1.dm#1b.ra]

alder_k
1     6.171095  _b[1.alder_k]
2     18.07456  _b[2.alder_k]

sex    2.072329  _b[sex]
risk_tid  (exposure)
			
Hope its not too elementary.....
		
Thanks you all for your contributions to statalist, it's a really
valuable source of information for me.
Regards,


Jesper Lindhardsen
MD, Ph.d. student
Department of Cardiovascular Research
Copenhagen University Hospital, Gentofte
Denmark


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index