Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
David Hoaglin <dchoaglin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Incomplete results of linear regression with interaction variable |

Date |
Wed, 20 Mar 2013 19:30:37 -0400 |

Jean-Baptiste, The table that lists the content of your database shows that n > 1100 in each of the four "cells" in the cross-classification by race and quality. If you fit the interaction model to the detailed data (all 1159 + 1188 + 1159 + 1169 = 4675 observations), you will have plenty of degrees of freedom to support standard errors for the four coefficients. In each cell, sd_call is substantially larger than mean_call. You did not describe the nature of the data, but that pattern suggests that the data may have substantial skewness, especially if the individual observations cannot be negative. You may want to consider a transformation that renders the distributions with the cells roughly symmetric. Alternatively, you may be able to use a generalized linear model with a random component that models the behavior of the data. If the data are counts, a Poisson or negative binomial model may be appropriate. David Hoaglin On Wed, Mar 20, 2013 at 5:56 PM, Jean-Baptiste Peraldi <jean-baptiste.peraldi@unil.ch> wrote: > Hi Statalisters, > > I want to to run two linear regressions with dichotomous independant variables, where one contains an interaction variable. > It appears that the regression with the interaction variable gives only results for the coefficients. > > Here is the content of my database: > *** > . list > +---------------------------------------------------------------------------+ > | race quality mean_call sd_call n r_q | > |----------------------------------------------------------------------------| > 1. | 0 0 .0854185 .279624 1159 0 | > 2. | 0 1 .1069024 .3091192 1188 0 | > 3. | 1 0 .0569456 .2318388 1159 0 | > 4. | 1 1 .0675791 .2511297 1169 1 | > +---------------------------------------------------------------------------+ > *** > > > The first regression is : > " mean_call = cst + beta1*race " > where "race" is a dichotomous (0 or 1) variable. > > The second regression contains an interaction variable : > " mean_call = cst + beta1*race + beta2*quality + beta3*race*quality " where both "race" and "quality" are dichotomous (0 or 1) variables. > > When running the first regression, I get full results: > *** > . reg mean_call race > > Source | SS df MS Number of obs = 4 > -------------+----------------------------------------- F( 1, 2) = 8.00 > Model | .001149076 1 .001149076 Prob > F = 0.1056 > Residual | .000287314 2 .000143657 R-squared = 0.8000 > -------------+----------------------------------------- Adj R-squared = 0.7000 > Total | .00143639 3 .000478797 Root MSE = .01199 > > ------------------------------------------------------------------------------ > mean_call | Coef. Std. Err. t P>|t| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > race | -.033898 .0119857 -2.83 0.106 -.0854683 .0176723 > _cons | .0961604 .0084752 11.35 0.008 .0596947 .1326261 > ------------------------------------------------------------------------------ > *** > > For the second regression, I create the interaction variable and run the regression > *** > . gen r_q = race*quality > . reg mean_call race quality r_q > > Source | SS df MS Number of obs = 4 > -------------+---------------------------------------- F( 3, 0) = . > Model | .00143639 3 .000478797 Prob > F = . > Residual | 0 0 . R-squared = 1.0000 > -------------+---------------------------------------- Adj R-squared = . > Total | .00143639 3 .000478797 Root MSE = 0 > > ------------------------------------------------------------------------------ > mean_call | Coef. Std. Err. t P>|t| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > race | -.0284728 . . . . . > quality | .0214839 . . . . . > r_q | -.0108504 . . . . . > _cons | .0854185 . . . . . > ------------------------------------------------------------------------------ > *** > Here we can see that we get results for the coefficients only, which is quite weird. I will be glad if you can help me solve this problem. > Thanks for your consideration. > > Jean-Baptiste P. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Incomplete results of linear regression with interaction variable***From:*Jean-Baptiste Peraldi <jean-baptiste.peraldi@unil.ch>

- Prev by Date:
**Re: st: ordered logistic integration problems** - Next by Date:
**st: calculating the whiskers on a boxplot using -twoway-** - Previous by thread:
**Re: st: Incomplete results of linear regression with interaction variable** - Next by thread:
**Re: st: Incomplete results of linear regression with interaction variable** - Index(es):