Re: st: MULTICOLLINEARITY & R-SQUARED

Thu, 30 Oct 2003 18:20:12 -0000 (GMT)

Richard, Thanks for your full reply to my thread. It's difficult to disagree with most of what you say, but what I was attempting to demonstrate was what happens to R^2 when the correlations between two or more statistically significant X-variables of interest are most certainly *not* zero (say one of 0.6). When this happens, R^2 is inflated, because not only is the variation in Y partly explained by the unique contribution made to it by X1 and X2, because also partly by the *overlap* (for the want of a more precise expression!) between them. As I said towards of my last thread, one of the desired aims is to build a model of explanatory variables which demonstrate *total independence* of each other. But, since we as social scientists attempt to model the determinants of human behaviour, that's little more than a pious hope, since there will inevitably be some inter-correlation between explanatory variables. The example I put forward demonstates this, and also invalidates the numerous futile attempts made by social scientists that X1 on its own contributed a certain proportion to the R^2 out of all the significant X's. C. > At 05:09 AM 10/29/2003 +0000, Clive Nicholas wrote: >>unlikely to vote Labour and vice versa. Because this overlap is carried >>forward to the computation of R^2, R^2 has been upwardly biased. > > Thanks, but I'm afraid I still don't follow. If the beta coefficients > were > all zero, R^2 would be zero. Further, while the intercorrelations of the > Xs may affect how large R^2 is, I don't see how that causes R^2 to be > "upwardly biased", i.e. just because something causes R^2 to be bigger > doesn't mean that it becomes biased towards a larger value. I'm aware of > various consequences of multicollinearity, e.g. large standard errors, > large confidence intervals, increased likelihood of saying a coefficient > does not differ from zero when it really does. But, I don't remember ever > hearing "upwardly biased R^2" as a problem. But that doesn't mean I > couldn't have missed it! But multicollinearity does not cause regression > coefficients to be biased (wildly variable from one sample to the next, > maybe, but not biased) so I am not sure why it would cause R^2 to be > biased. > > What I might say instead is, suppose you have two populations. In both > populations, the effects of the Xs on Y are identical. But, in one > population, the Xs are much more highly correlated with each other than > they are in the other population. This will likely cause the R^2 to > differ > between the 2 populations. If you just compared R^2 between the two > populations and not the actual coefficients, you could get a very > misleading idea of the differences between the two populations. These > kinds of ideas are discussed in my "Evils of R^2" handout at > http://www.nd.edu/~rwilliam/xsoc593/lectures/l16.pdf. > > ------------------------------------------- > Richard Williams, Associate Professor > OFFICE: (574)631-6668, (574)631-6463 > FAX: (574)288-4373 > HOME: (574)289-5227 > EMAIL: [email protected] > WWW (personal): http://www.nd.edu/~rwilliam > WWW (department): http://www.nd.edu/~soc > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > Yours, CLIVE NICHOLAS, Politics Building, School of Geography, Politics and Sociology, University of Newcastle-upon-Tyne, Newcastle-upon-Tyne, NE1 7RU, United Kingdom. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

