# st: MULTICOLLINEARITY & R-SQUARED

 From "Clive Nicholas" <[email protected]> To [email protected] Subject st: MULTICOLLINEARITY & R-SQUARED Date Wed, 29 Oct 2003 05:09:20 -0000 (GMT)

```Richard,

> At 01:46 AM 10/29/2003 +0000, Clive Nicholas wrote:
>
>>(a) Whatever is judged to be the 'best' measure of R^2, one *must* keep
>> in
>>mind that (i) high levels of intercorrelation between X-variables inflate
>>R^2 to artifically-high levels; and (ii) models deploying aggregate-level
>>data with large spatial units of analysis inevitably have knock-on
>>(upward) effects on R^2, regardless of its measurement;
>
> I'm not sure I understand (a)(i) -- Two Xs could be perfectly correlated
> with each other, and yet both could have zero correlation with Y.  Can you
> elaborate or give an example?

I can and there is. Let's take an hypothetical example from models of
British voting behaviour. Say I'm looking to predict if how voters voted
in the European elections of 1999 had any impact on whether or not they
voted Conservative in 2001, net of socio-demographic, spatial and economic
(control) effects. Let's say we have a model which is:

CON2001 = a + CON99EU + LAB99EU + LDM99EU + ECON + SOCIOD + GEOG + e

We shan't bother discussing the control factors: we're really keen to know
about the level of impact represented by the first three terms: Tory,
Labour and Lib-Dem voting in 1999. If the data are 'fair' data and the
model has been specificed correctly, we should find that the first has a
postive impact on CON2001 and the other two exerts a negative impact.
However, the CON and LAB terms are likely to show a rather sizeable
(negative) correlation with each other: voters voting Tory are very
unlikely to vote Labour and vice versa. Because this overlap is carried
forward to the computation of R^2, R^2 has been upwardly biased.

What we would like in our models are X-variables that are *totally*
independent of each other. But, when studying social and political
phenomena in our complex world, there's little chance of that. It's just
one reason out of many why R^2 should always be interpreted with caution.

Yours,
CLIVE NICHOLAS,
Politics Building,
School of Geography, Politics and Sociology,
University of Newcastle-upon-Tyne,
Newcastle-upon-Tyne,
NE1 7RU,
United Kingdom.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```