Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Incomplete results of linear regression with interaction variable


From   David Hoaglin <dchoaglin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Incomplete results of linear regression with interaction variable
Date   Wed, 20 Mar 2013 19:30:37 -0400

Jean-Baptiste,

The table that lists the content of your database shows that n > 1100
in each of the four "cells" in the cross-classification by race and
quality.  If you fit the interaction model to the detailed data (all
1159 + 1188 + 1159 + 1169 = 4675 observations), you will have plenty
of degrees of freedom to support standard errors for the four
coefficients.

In each cell, sd_call is substantially larger than mean_call.  You did
not describe the nature of the data, but that pattern suggests that
the data may have substantial skewness, especially if the individual
observations cannot be negative.  You may want to consider a
transformation that renders the distributions with the cells roughly
symmetric.  Alternatively, you may be able to use a generalized linear
model with a random component that models the behavior of the data.
If the data are counts, a Poisson or negative binomial model may be
appropriate.

David Hoaglin

On Wed, Mar 20, 2013 at 5:56 PM, Jean-Baptiste Peraldi
<jean-baptiste.peraldi@unil.ch> wrote:
> Hi Statalisters,
>
> I want to to run two linear regressions with dichotomous independant variables, where one contains an interaction variable.
> It appears that the regression with the interaction variable gives only results for the coefficients.
>
> Here is the content of my database:
> ***
> . list
>     +---------------------------------------------------------------------------+
>      |         race   quality   mean_call    sd_call            n         r_q |
>      |----------------------------------------------------------------------------|
>   1. |           0        0        .0854185       .279624       1159         0 |
>   2. |           0        1        .1069024       .3091192     1188         0 |
>   3. |    1        0        .0569456       .2318388     1159         0 |
>   4. |    1        1        .0675791       .2511297     1169         1 |
>      +---------------------------------------------------------------------------+
> ***
>
>
> The first regression is :
> " mean_call = cst + beta1*race "
> where "race" is a dichotomous (0 or 1) variable.
>
> The second regression contains an interaction variable :
> " mean_call = cst + beta1*race + beta2*quality + beta3*race*quality " where both "race" and "quality" are dichotomous (0 or 1) variables.
>
> When running the first regression, I get full results:
> ***
> . reg mean_call race
>
>  Source |      SS                    df       MS                       Number of obs =       4
> -------------+-----------------------------------------             F(  1,     2) =    8.00
>   Model      |  .001149076     1  .001149076           Prob > F      =  0.1056
>   Residual |  .000287314     2  .000143657           R-squared     =  0.8000
> -------------+-----------------------------------------             Adj R-squared =  0.7000
>        Total |   .00143639         3  .000478797           Root MSE      =  .01199
>
> ------------------------------------------------------------------------------
>    mean_call |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>         race |   -.033898   .0119857    -2.83   0.106    -.0854683    .0176723
>        _cons |   .0961604   .0084752    11.35   0.008     .0596947    .1326261
> ------------------------------------------------------------------------------
> ***
>
> For the second regression, I create the interaction variable and run the regression
> ***
> . gen r_q = race*quality
> . reg mean_call race quality r_q
>
>  Source |         SS                df       MS                     Number of obs =       4
> -------------+----------------------------------------           F(  3,     0) =       .
>  Model      |   .00143639     3  .000478797           Prob > F      =       .
>  Residual |       0                  0           .                  R-squared     =  1.0000
> -------------+----------------------------------------           Adj R-squared =       .
>        Total |   .00143639     3  .000478797            Root MSE      =       0
>
> ------------------------------------------------------------------------------
>    mean_call |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>         race |  -.0284728          .        .       .            .           .
>      quality |   .0214839          .        .       .            .           .
>          r_q |  -.0108504          .        .       .            .           .
>        _cons |   .0854185          .        .       .            .           .
> ------------------------------------------------------------------------------
> ***
> Here we can see that we get results for the coefficients only, which is quite weird. I will be glad if you can help me solve this problem.
> Thanks for your consideration.
>
> Jean-Baptiste P.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index