Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re-re-post: Stata 11 - Factor variables in a regression command

From   Michael Norman Mitchell <[email protected]>
To   [email protected]
Subject   Re: Re-re-post: Stata 11 - Factor variables in a regression command
Date   Fri, 30 Apr 2010 23:42:50 -0700

Dear Ricardo

  The command

. logistic y a#b

includes just the interaction of "a by b", and does not include the main effect of a, nor the main effect of b. By contrast, the command

. logistic y a##b

  includes the main effect of a, the main effect of b, as well as the a by b interaction. It is equivalent to typing

. logistic y a#b a b

As John Fox describes in his regression book, a properly formed regression model which contains an interaction will also include the all lower order main effects. In other words, when including a#b, you also include a and b. There are instances where one could omit the main effects, but only if you know exactly why you are doing so and understand the ramifications in terms of the intepretation of the terms in the model.

  I hope that is helpful.

Michael N. Mitchell
See the Stata tidbit of the week at...

On 2010-04-30 10.48 PM, Ricardo Basurto wrote:
Not the best way to start posting to StataList, is it? I am
re-arranging my message hoping that at least that way my question
won't be cut out. (If anyone has suggestions on how to successfully
submit messages from within Gmail, I would appreciate those as well.)


I am having trouble understanding the difference between a regression
that uses a cross operator (#) and one that uses a cross factorial
operator (##).
For example, below is the output I get from running two different
regressions.  From the log-likelihood ratio, chi2, etc, it seems clear
to me that both commands are fitting the same regression model.  Also,
I can reproduce the second regression by fitting a regression with
dummies for a=1, b=1, and a variable equal to the multiplication of
those two dummies; however, I just can't figure out what exact model
is being fitted in the first regression. Can anyone explain this?

Thank you,



. logistic y a#b

Logistic regression                             Number of obs   =      19670
                                               LR chi2(3)      =       7.71
                                               Prob>  chi2     =     0.0525
Log likelihood = -1473.1898                     Pseudo R2       =     0.0026

          y | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Int.]
        a#b |
       0 1  |   1.567419   .2804138     2.51   0.012     1.1038    2.2256
       1 0  |   1.447424   .2588797     2.07   0.039     1.0194    2.0551
       1 1  |   1.211988   .2246236     1.04   0.300     .84283    1.7428


. logistic y a##b

Logistic regression                             Number of obs   =      19670
                                               LR chi2(3)      =       7.71
                                               Prob>  chi2     =     0.0525
Log likelihood = -1473.1898                     Pseudo R2       =     0.0026

          y | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Int.]
        1.a |   1.447424   .2588797     2.07   0.039     1.0194    2.0551
        1.b |   1.567419   .2804138     2.51   0.012     1.1038    2.2256
        a#b |
       1 1  |   .5342167   .1302597    -2.57   0.010     .33125    .86152
*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index