Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Can multicollinearity problems be resolved by using residuals from another regression?

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Can multicollinearity problems be resolved by using residuals from another regression? Date Mon, 26 Nov 2012 21:18:02 -0500

```Dear A. Shaul,

In equation (1),
b1 reflects the relation of y to x1 after adjusting for the
contributions of (x1)^2 and x2;
b2 reflects the relation of y to (x1)^2 after adjusting for the
contributions of x1 and x2; and
b3 reflects the relation of y to x2 after adjusting for the
contributions of x1 and (x1)^2.
Thus, if x1 and (x1)^2 have a substantial linear relation (as often
happens when x1 has not been centered --- a good suggestion in another
reply), neither b1 nor b2 may differ significantly from zero.

Including x2 in equation (1) introduces the adjustments for the
contribution of x2 mentioned above.  You may want to regress y on x2
and a constant and regress x1 on x2 and a constant (producing y_res
and x1_res, respectively), and then regress y_res on x1_res and
(x1_res)^2 and a constant.

Why do you expect the nonlinear effect of x1 to be quadratic (or, more
generally, polynomial)?

If x1 does not affect x2, equation (2) is backward.

The definitions of "b1", "b2", and "b3" in equation (3) differ from
those in equation (1) because the two models do not contain the same
predictors (the list of other predictors in the model is part of the
definition of each coefficient in a multiple-regression model).

If you need to investigate collinearity, you can use -coldiag2-.  I
installed it recently from fmwww.bc.edu/RePEc/bocode/c .

I hope this discussion helps.

David Hoaglin

On Thu, Nov 8, 2012 at 9:36 PM, A. Shaul <3c5171@gmail.com> wrote:
> Dear Statalist,
>
> I expect a non-linear effect of an exogenous variable, x1, on a
> dependent variable, y. The variable x1 is affected by another
> exogenous variable, x2. The variable x2 affects x1 directly and also y
> directly. The variable x1 does not affect x2. I am only interested in
> the partial effect of x1 on y while controlling for x2 --- or at least
> while controlling for the part of the variation in x2 that affects y
> directly.
>
> I have the following regression equation:
>
>    (1)   y = b1*x1 + b2*(x1)^2 + b3*x2 + constant
>
> Although I get the expected estimates of b1 and b2, they are
> insignificant. They are, however, significant if I exclude x2. I
> believe this is the result of collinearity between x1 and x2 because
> x1 is affected by x2. I have tried to resolve the problem by first
> running the regression
>
>    (2)   x2 = x1 + constant
>
> and then generating the variable x2_res consisting of the residuals
> from regression (2). I have then modified regression model (1) by
> substituting x2 with x2_res, i.e. I then estimate the model:
>
>    (3)   y = b1*x1 + b2*(x1)^2 + b3*x2_res + constant
>
> The coefficients b1 and b2 are now significant. This is also the case
> if I used an n>2 degree polynomial in x1 in model (2).
>
> My thinking is that controlling for x2_res corresponds to controlling
> for the part of the variation of x2 that is not affecting x1.
>
> Does this make sense?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```