Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: My ANOVA and regression results don't agree
From
"Pepper, Jessica" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: My ANOVA and regression results don't agree
Date
Tue, 7 Jan 2014 15:42:08 +0000
Thank you both, David and Phil, for your responses. When I am back in the office and have access to Stata again tomorrow, I will play around with the drug/disease sample data set and try using the -contrast- command.
I am new to Statalist and not sure of the etiquette -- if I have follow-up questions, is it best to send to the whole list or follow up individually?
Thank you again.
Jess
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of David Hoaglin
Sent: Monday, January 06, 2014 10:00 PM
To: [email protected]
Subject: Re: st: My ANOVA and regression results don't agree
Phil,
Doesn't the ANOVA of those systolic data have some features of regression, because of the imbalance (18 observations in 4 cells)? In the Partial SS column, sum of the SS for drug, disease, and drug#disease is less than the SS for the model.
David Hoaglin
On Mon, Jan 6, 2014 at 9:10 PM, Phil Schumm <[email protected]> wrote:
> On Jan 6, 2014, at 4:53 PM, Pepper, Jessica <[email protected]> wrote:
>> Thanks for sending that link. I followed those instructions and got results that made sense. I just have 2 follow-up questions:
>>
>> 1. I understand that the 2 approaches (ANOVA and regress/test) don't correspond. When I follow the UCLA procedure that you sent the link to, it confirms what I initially found in the ANOVA and also shows me the contrasts, which is what I really need. All that is great. But why, in essence, should I "trust" the ANOVA over the regression? Why are the p values from the regression wrong?
>
>
> They're not wrong, just testing a different hypothesis given the way the covariates are coded. For example, consider the following simple example:
>
>
> . use http://www.stata-press.com/data/r13/systolic if inlist(drug,1,3) ///
> & inlist(disease,1,2)
>
> . anova systolic drug##disease
>
> Number of obs = 18 R-squared = 0.5706
> Root MSE = 10.5015 Adj R-squared = 0.4786
>
> Source | Partial SS df MS F Prob > F
> -------------+----------------------------------------------------
> Model | 2052.05 3 684.016667 6.20 0.0067
> |
> drug | 1429.39211 1 1429.39211 12.96 0.0029
> disease | 178.35117 1 178.35117 1.62 0.2242
> drug#disease | 123.918421 1 123.918421 1.12 0.3071
> |
> Residual | 1543.95 14 110.282143
> -------------+----------------------------------------------------
> Total | 3596 17 211.529412
>
> . reg systolic i.drug##i.disease, noheader
> ------------------------------------------------------------------------------
> systolic | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> 3.drug | -13 7.425703 -1.75 0.102 -28.92655 2.92655
> 2.disease | -1.083333 6.778709 -0.16 0.875 -15.62222 13.45555
> |
> drug#disease |
> 3 2 | -10.85 10.23563 -1.06 0.307 -32.80323 11.10323
> |
> _cons | 29.33333 4.287232 6.84 0.000 20.13814 38.52853
>
> ----------------------------------------------------------------------
> --------
>
> . test 3.drug + 3.drug#2.disease/2 = 0
>
> ( 1) 3.drug + .5*3.drug#2.disease = 0
>
> F( 1, 14) = 12.96
> Prob > F = 0.0029
>
>
> The main effect of drug in the regression corresponds to the difference between drugs 3 and 1 *among those with disease 1 only*. In contrast, the main effect of drug in the ANOVA corresponds to the overall (or marginal in the case of balanced data) difference between drugs 3 and 1. This is what people usually mean when they talk about a main effect. Of course, although in this case there is no evidence of an interaction between drug and disease, if there were, then the main effects might not be very meaningful.
>
> Note that you can also get the same main effect(s) after -regress- with -contrast-:
>
>
> . contrast g.drug, noeffects
>
> Contrasts of marginal linear predictions
>
> Margins : asbalanced
>
> ------------------------------------------------
> | df F P>F
> -------------+----------------------------------
> drug |
> (1 vs mean) | 1 12.96 0.0029
> (3 vs mean) | 1 12.96 0.0029
> Joint | 1 12.96 0.0029
> |
> Denominator | 14
> ------------------------------------------------
>
>
> which is easier if your covariate(s) have many levels.
>
>
>> 2. The procedure on the UCLA site defaults to treating the highest level of the variable as the reference category. That doesn't matter for my two level variable, but it does for my 3 level variable, correct? And if so, is there an easy way to tell it to treat the lowest level as the reference category? Or should I just manually create a new variable that switches those levels.
>
>
> IIRC, the UCLA FAQ created the dummy variables manually (with the -generate- option to -tab-). An easier option (for Stata 11 and higher) is to use factor variables, as I have illustrated in the regression above. These make it easy to change the base category (e.g., using ib3.drug in my example above would have caused Stata to use 3 as the base (drug) category instead of 1). Of course, which category you use as the base doesn't affect the model -- only the way the coefficients are presented in the table. Post-estimation, you can always obtain whatever contrast(s) you want.
>
>
>> I hope these questions make sense. I am new to Stata and have never encountered a situation where ANOVA and regression don't agree.
>
>
> That's a misnomer -- they do agree. What differs are the coefficients that result from a model matrix representing deviations from the (balanced) grand mean (as used in ANOVA) versus those resulting from dummy variables. As demonstrated above, however, it's easy to switch between these after you've fit the model. The same would be true in R or SAS.
>
>
> -- Phil
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/