Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]


From   "Clive Nicholas" <[email protected]>
To   [email protected]
Date   Thu, 30 Oct 2003 18:20:12 -0000 (GMT)


Thanks for your full reply to my thread. It's difficult to disagree with
most of what you say, but what I was attempting to demonstrate was what
happens to R^2 when the correlations between two or more statistically
significant X-variables of interest are most certainly *not* zero (say one
of 0.6). When this happens, R^2 is inflated, because not only is the
variation in Y partly explained by the unique contribution made to it by
X1 and X2, because also partly by the *overlap* (for the want of a more
precise expression!) between them.

As I said towards of my last thread, one of the desired aims is to build a
model of explanatory variables which demonstrate *total independence* of
each other. But, since we as social scientists attempt to model the
determinants of human behaviour, that's little more than a pious hope,
since there will inevitably be some inter-correlation between explanatory
variables. The example I put forward demonstates this, and also
invalidates the numerous futile attempts made by social scientists that X1
on its own contributed a certain proportion to the R^2 out of all the
significant X's.


> At 05:09 AM 10/29/2003 +0000, Clive Nicholas wrote:
>>unlikely to vote Labour and vice versa. Because this overlap is carried
>>forward to the computation of R^2, R^2 has been upwardly biased.
> Thanks, but I'm afraid I still don't follow.  If the beta coefficients
> were
> all zero, R^2 would be zero.  Further, while the intercorrelations of the
> Xs may affect how large R^2 is, I don't see how that causes R^2 to be
> "upwardly biased", i.e. just because something causes R^2 to be bigger
> doesn't mean that it becomes biased towards a larger value.  I'm aware of
> various consequences of multicollinearity, e.g. large standard errors,
> large confidence intervals, increased likelihood of saying a coefficient
> does not differ from zero when it really does.  But, I don't remember ever
> hearing "upwardly biased R^2" as a problem.  But that doesn't mean I
> couldn't have missed it!  But multicollinearity does not cause regression
> coefficients to be biased (wildly variable from one sample to the next,
> maybe, but not biased) so I am not sure why it would cause R^2 to be
> biased.
> What I might say instead is, suppose you have two populations.  In both
> populations, the effects of the Xs on Y are identical.  But, in one
> population, the Xs are much more highly correlated with each other than
> they are in the other population.  This will likely cause the R^2 to
> differ
> between the 2 populations.  If you just compared R^2 between the two
> populations and not the actual coefficients, you could get a very
> misleading idea of the differences between the two populations.  These
> kinds of ideas are discussed in my "Evils of R^2" handout at
> -------------------------------------------
> Richard Williams, Associate Professor
> OFFICE: (574)631-6668, (574)631-6463
> FAX:    (574)288-4373
> HOME:   (574)289-5227
> EMAIL:  [email protected]
> WWW (personal):
> WWW (department):
> *
> *   For searches and help try:
> *
> *
> *

Politics Building,
School of Geography, Politics and Sociology,
University of Newcastle-upon-Tyne,
NE1 7RU,
United Kingdom.
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index