Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: R squared of OLS with dummy variables |
Date | Fri, 6 Jul 2012 10:30:53 +0100 |
The use of dummies (I prefer the term indicators) is not the central issue here. Nor is there a bug. Dropping the constant is the entire issue. Stata then changes the way that R-square is calculated. This is utterly defensible, and to my mind standard. I can't comment on the "other software" you don't name. In essence you have changed the question and the answer is therefore different. For more, see e.g. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/noconstant.htm Forcing a regression hyperplane through the origin is a very big deal but occasionally it can (appear to) make sense on scientific or substantive grounds. For example, there are plenty of laws of the form y = b x in physical science. Conversely it seems very rare that you can be confident that a in a + b_1 x_1 + ... + b_k x_k should be forced to 0. But if your problem is collinearity, it's definitely not the solution. Your syntax makes little sense. It would make more sense if predict fit if e(sample) corr y x di r(rho)^2 were predict fit if e(sample) corr y fit di r(rho)^2 What this exposes is that in the standard linear regression model with a constant there are various different ways of thinking about R^2, but they all give the same answer. Away from that framework that's no longer true. Nick On Fri, Jul 6, 2012 at 9:57 AM, Stefano Lugo <stefano.lugo@mail.polimi.it> wrote: > I am estimating an OLS model which include dummy variables along with other > regressors. > If - instead of dropping one dummy due to collinearity - I drop the > constant, I get the same estimation for variables (including the > dummy/constant) coefficients and standard errors as expected but a very > different R squared of the model. > > Computing the R squared myself with > reg y x_a x_b d_1 d_2 d_3, nocons > predict fit if e(sample) > corr y x > di r(rho)^2 > > I get instead the same R squared showed by Stata when dropping the dummy > instead of the constant. > To check whether the problem is somehow in my data, I've tried to repeat the > same thing with simulated data and got the same problem. I have also tried > using an other software and it reports instead the same R squared for both > specifications. > > Is that a Stata bug or am I missing some theoretical explanation for this? > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/