Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R squared of OLS with dummy variables


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: R squared of OLS with dummy variables
Date   Fri, 6 Jul 2012 10:30:53 +0100

The use of dummies (I prefer the term indicators) is not the central
issue here. Nor is there a bug. Dropping the constant is the entire
issue. Stata then changes the way that R-square is calculated. This is
utterly defensible, and to my mind standard. I can't comment on the
"other software" you don't name.

In essence you have changed the question and the answer is therefore different.

For more, see e.g.
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/noconstant.htm

Forcing a regression hyperplane through the origin is a very big deal
but occasionally it can (appear to) make sense on scientific or
substantive grounds. For example, there are plenty of laws of the form
y = b x in physical science. Conversely it seems very rare that you
can be confident that a in  a + b_1 x_1 + ... + b_k x_k should be
forced to 0.

But if your problem is collinearity, it's definitely not the solution.

Your syntax makes little sense. It would make more sense if

predict fit if e(sample)
corr y x
di r(rho)^2

were

predict fit if e(sample)
corr y fit
di r(rho)^2

What this exposes is that in the standard linear regression model with
a constant there are various different ways of thinking about R^2, but
they all give the same answer. Away from that framework that's no
longer true.

Nick

On Fri, Jul 6, 2012 at 9:57 AM, Stefano Lugo
<stefano.lugo@mail.polimi.it> wrote:
> I am estimating an OLS model which include dummy variables along with other
> regressors.
> If - instead of dropping one dummy due to collinearity - I drop the
> constant, I get the same estimation for variables (including the
> dummy/constant) coefficients and standard errors as expected but a very
> different R squared of the model.
>
> Computing the R squared myself with
> reg y  x_a x_b d_1 d_2 d_3, nocons
> predict fit if e(sample)
> corr y x
> di r(rho)^2
>
> I get instead the same R squared showed by Stata when dropping the dummy
> instead of the constant.
> To check whether the problem is somehow in my data, I've tried to repeat the
> same thing with simulated data and got the same problem. I have also tried
> using an other software and it reports instead the same R squared for both
> specifications.
>
> Is that a Stata bug or am I missing some theoretical explanation for this?
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index