[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Clive Nicholas" <[email protected]> |

To |
[email protected] |

Subject |
st: Concession |

Date |
Thu, 30 Oct 2003 22:33:41 -0000 (GMT) |

```
Dave,
Thanks for the welcome. :-) It looks like my arguments were awry and I'm
being outnumbered, so I'll concede on this one!! *lol*
C.
> Clive,
>
> Welcome to the Stata community. My opinions added below.
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]]On Behalf Of Clive Nicholas
>> Sent: Thursday, October 30, 2003 10:20 AM
>> To: [email protected]
>> Cc: [email protected]
>> Subject: Re: st: MULTICOLLINEARITY & R-SQUARED
>>
>>
>> Richard,
>>
>> Thanks for your full reply to my thread. It's difficult to disagree with
>> most of what you say, but what I was attempting to demonstrate was what
>> happens to R^2 when the correlations between two or more statistically
>> significant X-variables of interest are most certainly *not* zero (say
>> one
>> of 0.6). When this happens, R^2 is inflated, because not only is the
>> variation in Y partly explained by the unique contribution made to it by
>> X1 and X2, because also partly by the *overlap* (for the want of a more
>> precise expression!) between them.
>
> This seems to suggest that R^2, as a measure of explained variance, is
> larger
> than it should be when predictors are correlated. The reasoning suggests
> that
> the correlation between the predictors somehow artificially reduces the
> residual
> sum of squares. First, adding any variable to a regression equation
> (estimated
> by OLS) will increase R^2. This has nothing to do with
> (multi)collinearity, but
> rather a decrease in degrees of freedom and is the reason many researchers
> prefer an "adjusted" R^2. Logic dictates that the explanatory power of a
> model
> will decrease as the correlation among predictors increases because
> there's less
> independent information being added to the system. As long as the
> collinearity
> is not exact and we can actually estimate the equation, however, we know
> that
> the R^2 will necessarily increase due to the degrees of freedom issue.
> So, as a
> practical matter, one could claim that collinearity inflated the R^2, but
> that
> seems to obscure the true nature of the problem, which is a simple
> mathematical
> relation between R^2 and the number of predictors, regardless of
> collinearity.
> In other words, this is true even when the predictors are completely
> independent. If one suspects this is a problem, then the simple solution
> is to
> use R^2 adjusted for degrees of freedom. In any case, the "overlap"
> cannot
> increase the explanatory power of the model and definitely would not cause
> R^2
> to increase. Rather, the opposite is true. Compare the R^2 from a three
> variable equation in which Y is correlated with X1 at .2, with X2 at .2,
> and X1
> and X2 are correlated at .9, to the same equation with the correlation
> between
> X1 and X2 now at .2. The R^2 for the latter equation (~.067) is much
> larger
> than the R^2 for the former (~.042).
>
>
>> As I said towards of my last thread, one of the desired aims is to build
>> a
>> model of explanatory variables which demonstrate *total independence* of
>> each other. But, since we as social scientists attempt to model the
>> determinants of human behaviour, that's little more than a pious hope,
>> since there will inevitably be some inter-correlation between
>> explanatory
>> variables. The example I put forward demonstates this, and also
>> invalidates the numerous futile attempts made by social scientists that
>> X1
>> on its own contributed a certain proportion to the R^2 out of all the
>> significant X's.
>
> I completely disagree with the "desired aim" of building models with
> explanatory
> variables that demonstrate total independence of each other. One would
> not need
> multiple regression if the explanatory variables were indeed totally
> independent
> because the results of simple regression, indeed a correlation matrix
> would
> suffice, could simply be added up. Multiple regression is useful
> precisely
> because we wouldn't expect explanatory variables to be independent. I
> would go
> further and suggest that the world would be a pretty sorry place in which
> to
> live if all of our explanatory variables were truly independent of each
> other.
>
>
> Dave Moore
>
>
Yours,
CLIVE NICHOLAS,
Politics Building,
School of Geography, Politics and Sociology,
University of Newcastle-upon-Tyne,
Newcastle-upon-Tyne,
NE1 7RU,
United Kingdom.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
```

**Follow-Ups**:**Re: st: Concession***From:*Richard Williams <[email protected]>

**References**:**Re: st: MULTICOLLINEARITY & R-SQUARED***From:*"Clive Nicholas" <[email protected]>

- Prev by Date:
**st: quotes and dialog boxes** - Next by Date:
**st: Empty** - Previous by thread:
**Re: st: MULTICOLLINEARITY & R-SQUARED** - Next by thread:
**Re: st: Concession** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |