Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Concession


From   "Clive Nicholas" <[email protected]>
To   [email protected]
Subject   st: Concession
Date   Thu, 30 Oct 2003 22:33:41 -0000 (GMT)

Dave,

Thanks for the welcome. :-) It looks like my arguments were awry and I'm
being outnumbered, so I'll concede on this one!! *lol*

C.

> Clive,
>
> Welcome to the Stata community.  My opinions added below.
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]]On Behalf Of Clive Nicholas
>> Sent: Thursday, October 30, 2003 10:20 AM
>> To: [email protected]
>> Cc: [email protected]
>> Subject: Re: st: MULTICOLLINEARITY & R-SQUARED
>>
>>
>> Richard,
>>
>> Thanks for your full reply to my thread. It's difficult to disagree with
>> most of what you say, but what I was attempting to demonstrate was what
>> happens to R^2 when the correlations between two or more statistically
>> significant X-variables of interest are most certainly *not* zero (say
>> one
>> of 0.6). When this happens, R^2 is inflated, because not only is the
>> variation in Y partly explained by the unique contribution made to it by
>> X1 and X2, because also partly by the *overlap* (for the want of a more
>> precise expression!) between them.
>
> This seems to suggest that R^2, as a measure of explained variance, is
> larger
> than it should be when predictors are correlated.  The reasoning suggests
> that
> the correlation between the predictors somehow artificially reduces the
> residual
> sum of squares.  First, adding any variable to a regression equation
> (estimated
> by OLS) will increase R^2.  This has nothing to do with
> (multi)collinearity, but
> rather a decrease in degrees of freedom and is the reason many researchers
> prefer an "adjusted" R^2.  Logic dictates that the explanatory power of a
> model
> will decrease as the correlation among predictors increases because
> there's less
> independent information being added to the system.  As long as the
> collinearity
> is not exact and we can actually estimate the equation, however, we know
> that
> the R^2 will necessarily increase due to the degrees of freedom issue.
> So, as a
> practical matter, one could claim that collinearity inflated the R^2, but
> that
> seems to obscure the true nature of the problem, which is a simple
> mathematical
> relation between R^2 and the number of predictors, regardless of
> collinearity.
> In other words, this is true even when the predictors are completely
> independent.  If one suspects this is a problem, then the simple solution
> is to
> use R^2 adjusted for degrees of freedom.  In any case, the "overlap"
> cannot
> increase the explanatory power of the model and definitely would not cause
> R^2
> to increase.  Rather, the opposite is true.  Compare the R^2 from a three
> variable equation in which Y is correlated with X1 at .2, with X2 at .2,
> and X1
> and X2 are correlated at .9, to the same equation with the correlation
> between
> X1 and X2 now at .2.  The R^2 for the latter equation (~.067) is much
> larger
> than the R^2 for the former (~.042).
>
>
>> As I said towards of my last thread, one of the desired aims is to build
>> a
>> model of explanatory variables which demonstrate *total independence* of
>> each other. But, since we as social scientists attempt to model the
>> determinants of human behaviour, that's little more than a pious hope,
>> since there will inevitably be some inter-correlation between
>> explanatory
>> variables. The example I put forward demonstates this, and also
>> invalidates the numerous futile attempts made by social scientists that
>> X1
>> on its own contributed a certain proportion to the R^2 out of all the
>> significant X's.
>
> I completely disagree with the "desired aim" of building models with
> explanatory
> variables that demonstrate total independence of each other.  One would
> not need
> multiple regression if the explanatory variables were indeed totally
> independent
> because the results of simple regression, indeed a correlation matrix
> would
> suffice, could simply be added up.  Multiple regression is useful
> precisely
> because we wouldn't expect explanatory variables to be independent.  I
> would go
> further and suggest that the world would be a pretty sorry place in which
> to
> live if all of our explanatory variables were truly independent of each
> other.
>
>
> Dave Moore
>
>


Yours,
CLIVE NICHOLAS,
Politics Building,
School of Geography, Politics and Sociology,
University of Newcastle-upon-Tyne,
Newcastle-upon-Tyne,
NE1 7RU,
United Kingdom.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index