Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R squared of OLS with dummy variables

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: R squared of OLS with dummy variables
Date	Fri, 6 Jul 2012 10:30:53 +0100

The use of dummies (I prefer the term indicators) is not the central
issue here. Nor is there a bug. Dropping the constant is the entire
issue. Stata then changes the way that R-square is calculated. This is
utterly defensible, and to my mind standard. I can't comment on the
"other software" you don't name.

In essence you have changed the question and the answer is therefore different.

For more, see e.g.
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/noconstant.htm

Forcing a regression hyperplane through the origin is a very big deal
but occasionally it can (appear to) make sense on scientific or
substantive grounds. For example, there are plenty of laws of the form
y = b x in physical science. Conversely it seems very rare that you
can be confident that a in  a + b_1 x_1 + ... + b_k x_k should be
forced to 0.

But if your problem is collinearity, it's definitely not the solution.

Your syntax makes little sense. It would make more sense if

predict fit if e(sample)
corr y x
di r(rho)^2

were

predict fit if e(sample)
corr y fit
di r(rho)^2

What this exposes is that in the standard linear regression model with
a constant there are various different ways of thinking about R^2, but
they all give the same answer. Away from that framework that's no
longer true.

Nick

On Fri, Jul 6, 2012 at 9:57 AM, Stefano Lugo
<[email protected]> wrote:
> I am estimating an OLS model which include dummy variables along with other
> regressors.
> If - instead of dropping one dummy due to collinearity - I drop the
> constant, I get the same estimation for variables (including the
> dummy/constant) coefficients and standard errors as expected but a very
> different R squared of the model.
>
> Computing the R squared myself with
> reg y  x_a x_b d_1 d_2 d_3, nocons
> predict fit if e(sample)
> corr y x
> di r(rho)^2
>
> I get instead the same R squared showed by Stata when dropping the dummy
> instead of the constant.
> To check whether the problem is somehow in my data, I've tried to repeat the
> same thing with simulated data and got the same problem. I have also tried
> using an other software and it reports instead the same R squared for both
> specifications.
>
> Is that a Stata bug or am I missing some theoretical explanation for this?
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: R squared of OLS with dummy variables
  - From: Stefano Lugo <[email protected]>

Prev by Date: st: change the format of the pseudo log-likelihood in gb2fit
Next by Date: Re: st: Multiple imputation with panel data
Previous by thread: st: R squared of OLS with dummy variables
Next by thread: Re: st: R squared of OLS with dummy variables
Index(es):
- Date
- Thread