Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Michael Norman Mitchell <Michael.Norman.Mitchell@gmail.com> |

Subject |
Re: Re-re-post: Stata 11 - Factor variables in a regression command |

Date |
Sat, 01 May 2010 10:31:05 -0700 |

Greetings

Richard Williams wrote... --- snip ---As the original example shows, the fits produced by the first twosyntaxes are identical.--- snip --- I completely agree with Richard, that . logistic y a#b and . logistic y a##bboth are two different ways of parameterizing a model with twocategorical predictors. If we let factor a have A levels, and factor bhave B levels, then both models will have(A-1) + (B-1) + (A-1)*(B-1)parameters in the model. In fact, this illustrates how theparameters are decomposed in a traditional parameterization (i.a i.ba#b), decomposing it into "main effect of a" (A-1 df), "main effect ofb" (B-1 df), and "a by b interaction" ( (A-1)*(B-1) df).If, instead one specifies -a#b-, this term has (A-1) + (B-1) +(A-1)*(B-1) , and is no longer partitioned into main effect of a, maineffect of b, and interaction. The omnibus test of this effect is theoverall test of the null hypothesis that there is simultaneously nomain effect of a, no main effect of b, and no a by b interaction. As Ishow below, it simply tests the equality of means in all of the cells.I think this is rarely of research interest when one has this kind of"factorial" layout.So, if this is what the omnibus test is doing, what about theindividual paramters. Looking at Ricardo's initial example---------------------------------------------------------------------------- y | Odds Ratio Std. Err. z P>|z| [95% Conf. Int.] -----------+---------------------------------------------------------------- a#b | 0 1 | 1.567419 .2804138 2.51 0.012 1.1038 2.2256 1 0 | 1.447424 .2588797 2.07 0.039 1.0194 2.0551 1 1 | 1.211988 .2246236 1.04 0.300 .84283 1.7428 ---------------------------------------------------------------------------- Note how this is much like a "oneway" layout of the data, where there are four groups, and one of the groups is an omitted group (the group a=0 b=0 is the omitted group). So, each of these parameters is testing whether the "cell" differs from the omitted cell. That is, the first parameter tests whether the cell labeled a=0 b=1 differs from the cell a=0 b=0. It is as though the design had been converted into having four groups (labled 1 2 3 4, and group 1 is the omitted group corresponding to a=0 b=0). Then, the tests compare group 2 vs. 1, group 3 vs 1, and group 4 vs. 1. The omnibus test of all the parameters, as noted above, tests the equality of all of the cell means. Returning to Richards point, as he notes this is just an alternative parameterization of the original model, now where each cell is compared to a reference cell. If this is the desired series of comparisons a researcher wants to make, this is a very useful and parameterization. I hope that is useful to Ricardo, and any other readers, Best regards, MichaelMichael N. Mitchell See the Stata tidbit of the week at... http://www.MichaelNormanMitchell.com On 2010-05-01 8.50 AM, Richard Williams wrote:At 01:42 AM 5/1/2010, Michael Norman Mitchell wrote:Dear Ricardo The command . logistic y a#bincludes just the interaction of "a by b", and does not includethe main effect of a, nor the main effect of b. By contrast, thecommand. logistic y a##bincludes the main effect of a, the main effect of b, as well asthe a by b interaction. It is equivalent to typing. logistic y a#b a bI don't think this is quite right. As the original example shows,the fits produced by the first two syntaxes are identical. So, a#band a##b are different ways of parameterizing the models. a##b givesyou the main effect of a, the main effect of b, and the interaction,i.e. it is the same as entering a, b, and a*b in the model. a*b = 1if a and b both equal 1, 0 otherwise. I believe this is equivalentto your 3rd syntax, except I would say i.a and i.b so Stata knowsthese are categorical variables.With a#b, there are four possible combinations of values: 0 0, 0 1, 10, and 1 1. The first gets dropped and the other three are in themodel.These are two parameterizations of the same model; personally Iprefer the a##b approach because it separates main effects frominteraction effects.The following example illustrates the 3 different approaches, andshows the equivalence of the last 2 approaches in Michael's example:. use "http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta";,clear(77 & 89 General Social Survey) . logit warmlt2 yr89#male, nologLogistic regression Number of obs= 2293LR chi2(3)= 64.74Prob > chi2= 0.0000Log likelihood = -851.54241 Pseudo R2= 0.0366------------------------------------------------------------------------------warmlt2 | Coef. Std. Err. z P>|z| [95% Conf.Interval]-------------+----------------------------------------------------------------yr89#male |0 1 | .1816812 .1431068 1.27 0.204 -.098803.46216551 0 | -1.295833 .229115 -5.66 0.000 -1.74489-.84677621 1 | -.659902 .2022755 -3.26 0.001 -1.056355-.2634493|_cons | -1.667376 .1021154 -16.33 0.000 -1.867518-1.467233------------------------------------------------------------------------------. logit warmlt2 yr89##male, nologLogistic regression Number of obs= 2293LR chi2(3)= 64.74Prob > chi2= 0.0000Log likelihood = -851.54241 Pseudo R2= 0.0366------------------------------------------------------------------------------warmlt2 | Coef. Std. Err. z P>|z| [95% Conf.Interval]-------------+----------------------------------------------------------------1.yr89 | -1.295833 .229115 -5.66 0.000 -1.74489-.84677621.male | .1816812 .1431068 1.27 0.204 -.098803.4621655| yr89#male |1 1 | .4542502 .3050139 1.49 0.136 -.14356611.052066|_cons | -1.667376 .1021154 -16.33 0.000 -1.867518-1.467233------------------------------------------------------------------------------. logit warmlt2 i.yr89 i.male yr89#male, nologLogistic regression Number of obs= 2293LR chi2(3)= 64.74Prob > chi2= 0.0000Log likelihood = -851.54241 Pseudo R2= 0.0366------------------------------------------------------------------------------warmlt2 | Coef. Std. Err. z P>|z| [95% Conf.Interval]-------------+----------------------------------------------------------------1.yr89 | -1.295833 .229115 -5.66 0.000 -1.74489-.84677621.male | .1816812 .1431068 1.27 0.204 -.098803.4621655| yr89#male |1 1 | .4542502 .3050139 1.49 0.136 -.14356611.052066|_cons | -1.667376 .1021154 -16.33 0.000 -1.867518-1.467233------------------------------------------------------------------------------------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: Re-re-post: Stata 11 - Factor variables in a regression command***From:*Richard Williams <Williams.NDA@comcast.net>

**References**:**Re-re-post: Stata 11 - Factor variables in a regression command***From:*Ricardo Basurto <ricardobasurto@gmail.com>

**Re: Re-re-post: Stata 11 - Factor variables in a regression command***From:*Michael Norman Mitchell <Michael.Norman.Mitchell@gmail.com>

**Re: Re-re-post: Stata 11 - Factor variables in a regression command***From:*Richard Williams <Williams.NDA@comcast.net>

- Prev by Date:
**Re: st: using Stata to detect interviewer fraud** - Next by Date:
**st: re: xtivreg** - Previous by thread:
**Re: Re-re-post: Stata 11 - Factor variables in a regression command** - Next by thread:
**Re: Re-re-post: Stata 11 - Factor variables in a regression command** - Index(es):