Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Michael Norman Mitchell <Michael.Norman.Mitchell@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: Re-re-post: Stata 11 - Factor variables in a regression command |
Date | Fri, 30 Apr 2010 23:42:50 -0700 |
Dear Ricardo The command . logistic y a#bincludes just the interaction of "a by b", and does not include the main effect of a, nor the main effect of b. By contrast, the command
. logistic y a##b includes the main effect of a, the main effect of b, as well as the a by b interaction. It is equivalent to typing . logistic y a#b a bAs John Fox describes in his regression book, a properly formed regression model which contains an interaction will also include the all lower order main effects. In other words, when including a#b, you also include a and b. There are instances where one could omit the main effects, but only if you know exactly why you are doing so and understand the ramifications in terms of the intepretation of the terms in the model.
I hope that is helpful. Michael N. Mitchell See the Stata tidbit of the week at... http://www.MichaelNormanMitchell.com On 2010-04-30 10.48 PM, Ricardo Basurto wrote:
Not the best way to start posting to StataList, is it? I am re-arranging my message hoping that at least that way my question won't be cut out. (If anyone has suggestions on how to successfully submit messages from within Gmail, I would appreciate those as well.) -------------------------------------------------------------------------------------------------------------------------------------------------------- I am having trouble understanding the difference between a regression that uses a cross operator (#) and one that uses a cross factorial operator (##). For example, below is the output I get from running two different regressions. From the log-likelihood ratio, chi2, etc, it seems clear to me that both commands are fitting the same regression model. Also, I can reproduce the second regression by fitting a regression with dummies for a=1, b=1, and a variable equal to the multiplication of those two dummies; however, I just can't figure out what exact model is being fitted in the first regression. Can anyone explain this? Thank you, Ricardo REGRESSION #1: . logistic y a#b Logistic regression Number of obs = 19670 LR chi2(3) = 7.71 Prob> chi2 = 0.0525 Log likelihood = -1473.1898 Pseudo R2 = 0.0026 ---------------------------------------------------------------------------- y | Odds Ratio Std. Err. z P>|z| [95% Conf. Int.] -----------+---------------------------------------------------------------- a#b | 0 1 | 1.567419 .2804138 2.51 0.012 1.1038 2.2256 1 0 | 1.447424 .2588797 2.07 0.039 1.0194 2.0551 1 1 | 1.211988 .2246236 1.04 0.300 .84283 1.7428 ---------------------------------------------------------------------------- REGRESSION #2 . logistic y a##b Logistic regression Number of obs = 19670 LR chi2(3) = 7.71 Prob> chi2 = 0.0525 Log likelihood = -1473.1898 Pseudo R2 = 0.0026 ---------------------------------------------------------------------------- y | Odds Ratio Std. Err. z P>|z| [95% Conf. Int.] -----------+---------------------------------------------------------------- 1.a | 1.447424 .2588797 2.07 0.039 1.0194 2.0551 1.b | 1.567419 .2804138 2.51 0.012 1.1038 2.2256 | a#b | 1 1 | .5342167 .1302597 -2.57 0.010 .33125 .86152 ---------------------------------------------------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/