Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re-re-post: Stata 11 - Factor variables in a regression command


From   Richard Williams <Williams.NDA@comcast.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: Re-re-post: Stata 11 - Factor variables in a regression command
Date   Sat, 01 May 2010 10:50:25 -0500

At 01:42 AM 5/1/2010, Michael Norman Mitchell wrote:
Dear Ricardo

  The command

. logistic y a#b

includes just the interaction of "a by b", and does not include the main effect of a, nor the main effect of b. By contrast, the command

. logistic y a##b

includes the main effect of a, the main effect of b, as well as the a by b interaction. It is equivalent to typing

. logistic y a#b a b

I don't think this is quite right. As the original example shows, the fits produced by the first two syntaxes are identical. So, a#b and a##b are different ways of parameterizing the models. a##b gives you the main effect of a, the main effect of b, and the interaction, i.e. it is the same as entering a, b, and a*b in the model. a*b = 1 if a and b both equal 1, 0 otherwise. I believe this is equivalent to your 3rd syntax, except I would say i.a and i.b so Stata knows these are categorical variables.

With a#b, there are four possible combinations of values: 0 0, 0 1, 1 0, and 1 1. The first gets dropped and the other three are in the model.

These are two parameterizations of the same model; personally I prefer the a##b approach because it separates main effects from interaction effects.

The following example illustrates the 3 different approaches, and shows the equivalence of the last 2 approaches in Michael's example:

. use "http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta";, clear
(77 & 89 General Social Survey)

. logit  warmlt2 yr89#male, nolog

Logistic regression                               Number of obs   =       2293
                                                  LR chi2(3)      =      64.74
                                                  Prob > chi2     =     0.0000
Log likelihood = -851.54241                       Pseudo R2       =     0.0366

------------------------------------------------------------------------------
     warmlt2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   yr89#male |
        0 1  |   .1816812   .1431068     1.27   0.204     -.098803    .4621655
        1 0  |  -1.295833    .229115    -5.66   0.000     -1.74489   -.8467762
        1 1  |   -.659902   .2022755    -3.26   0.001    -1.056355   -.2634493
             |
       _cons |  -1.667376   .1021154   -16.33   0.000    -1.867518   -1.467233
------------------------------------------------------------------------------

. logit  warmlt2 yr89##male, nolog

Logistic regression                               Number of obs   =       2293
                                                  LR chi2(3)      =      64.74
                                                  Prob > chi2     =     0.0000
Log likelihood = -851.54241                       Pseudo R2       =     0.0366

------------------------------------------------------------------------------
     warmlt2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.yr89 |  -1.295833    .229115    -5.66   0.000     -1.74489   -.8467762
      1.male |   .1816812   .1431068     1.27   0.204     -.098803    .4621655
             |
   yr89#male |
        1 1  |   .4542502   .3050139     1.49   0.136    -.1435661    1.052066
             |
       _cons |  -1.667376   .1021154   -16.33   0.000    -1.867518   -1.467233
------------------------------------------------------------------------------

. logit  warmlt2 i.yr89 i.male yr89#male, nolog

Logistic regression                               Number of obs   =       2293
                                                  LR chi2(3)      =      64.74
                                                  Prob > chi2     =     0.0000
Log likelihood = -851.54241                       Pseudo R2       =     0.0366

------------------------------------------------------------------------------
     warmlt2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.yr89 |  -1.295833    .229115    -5.66   0.000     -1.74489   -.8467762
      1.male |   .1816812   .1431068     1.27   0.204     -.098803    .4621655
             |
   yr89#male |
        1 1  |   .4542502   .3050139     1.49   0.136    -.1435661    1.052066
             |
       _cons |  -1.667376   .1021154   -16.33   0.000    -1.867518   -1.467233
------------------------------------------------------------------------------



-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index