Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: My ANOVA and regression results don't agree

From	David Hoaglin <[email protected]>
To	[email protected]
Subject	Re: st: My ANOVA and regression results don't agree
Date	Mon, 6 Jan 2014 21:59:34 -0500

Phil,

Doesn't the ANOVA of those systolic data have some features of
regression, because of the imbalance (18 observations in 4 cells)?  In
the Partial SS column, sum of the SS for drug, disease, and
drug#disease is less than the SS for the model.

David Hoaglin

On Mon, Jan 6, 2014 at 9:10 PM, Phil Schumm <[email protected]> wrote:
> On Jan 6, 2014, at 4:53 PM, Pepper, Jessica <[email protected]> wrote:
>> Thanks for sending that link. I followed those instructions and got results that made sense. I just have 2 follow-up questions:
>>
>> 1. I understand that the 2 approaches (ANOVA and regress/test) don't correspond. When I follow the UCLA procedure that you sent the link to, it confirms what I initially found in the ANOVA and also shows me the contrasts, which is what I really need. All that is great. But why, in essence, should I "trust" the ANOVA over the regression? Why are the p values from the regression wrong?
>
>
> They're not wrong, just testing a different hypothesis given the way the covariates are coded.  For example, consider the following simple example:
>
>
>     . use http://www.stata-press.com/data/r13/systolic if inlist(drug,1,3) ///
>         & inlist(disease,1,2)
>
>     . anova systolic drug##disease
>
>                        Number of obs =      18     R-squared     =  0.5706
>                        Root MSE      = 10.5015     Adj R-squared =  0.4786
>
>               Source |  Partial SS    df       MS           F     Prob > F
>         -------------+----------------------------------------------------
>                Model |     2052.05     3  684.016667       6.20     0.0067
>                      |
>                 drug |  1429.39211     1  1429.39211      12.96     0.0029
>              disease |   178.35117     1   178.35117       1.62     0.2242
>         drug#disease |  123.918421     1  123.918421       1.12     0.3071
>                      |
>             Residual |     1543.95    14  110.282143
>         -------------+----------------------------------------------------
>                Total |        3596    17  211.529412
>
>     . reg systolic i.drug##i.disease, noheader
>     ------------------------------------------------------------------------------
>         systolic |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>     -------------+----------------------------------------------------------------
>           3.drug |        -13   7.425703    -1.75   0.102    -28.92655     2.92655
>        2.disease |  -1.083333   6.778709    -0.16   0.875    -15.62222    13.45555
>                  |
>     drug#disease |
>             3 2  |     -10.85   10.23563    -1.06   0.307    -32.80323    11.10323
>                  |
>            _cons |   29.33333   4.287232     6.84   0.000     20.13814    38.52853
>     ------------------------------------------------------------------------------
>
>     . test 3.drug + 3.drug#2.disease/2 = 0
>
>      ( 1)  3.drug + .5*3.drug#2.disease = 0
>
>            F(  1,    14) =   12.96
>                 Prob > F =    0.0029
>
>
> The main effect of drug in the regression corresponds to the difference between drugs 3 and 1 *among those with disease 1 only*.  In contrast, the main effect of drug in the ANOVA corresponds to the overall (or marginal in the case of balanced data) difference between drugs 3 and 1.  This is what people usually mean when they talk about a main effect.  Of course, although in this case there is no evidence of an interaction between drug and disease, if there were, then the main effects might not be very meaningful.
>
> Note that you can also get the same main effect(s) after -regress- with -contrast-:
>
>
>     . contrast g.drug, noeffects
>
>     Contrasts of marginal linear predictions
>
>     Margins      : asbalanced
>
>     ------------------------------------------------
>                  |         df           F        P>F
>     -------------+----------------------------------
>             drug |
>     (1 vs mean)  |          1       12.96     0.0029
>     (3 vs mean)  |          1       12.96     0.0029
>           Joint  |          1       12.96     0.0029
>                  |
>      Denominator |         14
>     ------------------------------------------------
>
>
> which is easier if your covariate(s) have many levels.
>
>
>> 2. The procedure on the UCLA site defaults to treating the highest level of the variable as the reference category. That doesn't matter for my two level variable, but it does for my 3 level variable, correct? And if so, is there an easy way to tell it to treat the lowest level as the reference category? Or should I just manually create a new variable that switches those levels.
>
>
> IIRC, the UCLA FAQ created the dummy variables manually (with the -generate- option to -tab-).  An easier option (for Stata 11 and higher) is to use factor variables, as I have illustrated in the regression above.  These make it easy to change the base category (e.g., using ib3.drug in my example above would have caused Stata to use 3 as the base (drug) category instead of 1).  Of course, which category you use as the base doesn't affect the model -- only the way the coefficients are presented in the table.  Post-estimation, you can always obtain whatever contrast(s) you want.
>
>
>> I hope these questions make sense. I am new to Stata and have never encountered a situation where ANOVA and regression don't agree.
>
>
> That's a misnomer -- they do agree.  What differs are the coefficients that result from a model matrix representing deviations from the (balanced) grand mean (as used in ANOVA) versus those resulting from dummy variables.  As demonstrated above, however, it's easy to switch between these after you've fit the model.  The same would be true in R or SAS.
>
>
> -- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: My ANOVA and regression results don't agree
  - From: "Pepper, Jessica" <[email protected]>
- Re: st: My ANOVA and regression results don't agree
  - From: Phil Schumm <[email protected]>

References:
- st: My ANOVA and regression results don't agree
  - From: "Pepper, Jessica" <[email protected]>
- Re: st: My ANOVA and regression results don't agree
  - From: Phil Schumm <[email protected]>
- Re: st: My ANOVA and regression results don't agree
  - From: Phil Schumm <[email protected]>

Prev by Date: Re: st: Macro parsing question
Next by Date: Re: st: Census/Demographics Datasets
Previous by thread: Re: st: My ANOVA and regression results don't agree
Next by thread: Re: st: My ANOVA and regression results don't agree
Index(es):
- Date
- Thread