Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: adjusted r square

From   "Marcos Delprato" <>
To   <>
Subject   st: RE: Re: adjusted r square
Date   Wed, 21 Feb 2007 17:22:24 -0000

Dear Uli,

You may as well want to read new papers about causal effects which integrate
and generalise different previous and isolated approaches (i.e. DAG,
treatment evaluation and machine learning). You'll find them at:

Hope it helps.


-----Original Message-----
[] On Behalf Of Ulrich Kohler
Sent: 21 February 2007 10:05
Subject: st: Re: adjusted r square

Richard Williams wrote:
> At 10:34 AM 2/20/2007, Ulrich Kohler wrote:
>>However, as an aside: I do not find the arguments for the adjusted R2 very
>>convincing. It is sometimes said that you have to be punished for
>> including additional variables in a model. But why? Because the R2
>> increases? Why do I need to be punished for this? It is just a simple
>> fact that I can explain more variance with an additional variable.
>> Punishment and especially the
> I don't think "punishment" is the original rationale for adjusted
> R^2, although that is often cited as one of its benefits.  Rather,
> R^2 is biased upwards, especially in small samples.  Adjusted R^2
> corrects for that.
> McClendon discusses this in "Multiple Regression and Causal
> Analysis", 1994, pp. 81-82.
> Basically he says that sampling error will always cause R^2 to be
> greater than zero, i.e. even if no variable has an effect R^2 will be
> positive in a sample.  When there are no effects, across multiple
> samples you will see estimated coefficients sometimes positive,
> sometimes negative, but either way you are going to get a non-zero
> positive R^2. Further, when there are many Xs for a given sample
> size, there is more opportunity for R^2 to increase by chance.
> So, adjusted R^2 wasn't primarily designed to "punish" you for
> mindlessly including extraneous variables (although it has that
> effect), it was just meant to correct for the inherent upward bias in
> regular R^2.

Thank you, Richard, for this clarificaton. I wasn't aware of this. Obviously
my critique was overstated ("metaphysics"). The reason for my furor is that
saw so many students that build models simply by adding variables that
increase the adjusted R2, believing that they end up with a model that holds
only "important" variables. I think this is a misunderstanding of what
are about.

I am interested in the "causal" effect of a "key causal variable". Therefore
must not include an independent variable into my model that is itself caused
by the key causal variables, but I must include all variables in the model
that are causes of the "key causal variable". However, sometimes I include a
specific variable that depends on the key causal variable in a second model.
In this case I look at the change of the key causal variable's effect,
to learn something about the mechanisms through which the key causal
effects the dependent variable.

All this happens without reference to the model-fit. The whole process is
controlled by *hypotheses* about the causal order between variables.
models by looking at the model-fit shifts the attention away to a different
sort of reasoning. It is the sort of reasoning that leads directly to
automatic model building strategies (like "stepwise" and the like) -- and
somehow the arguments for the  "adjusted" are just a little more related to
this direction.

My reasoning here is very much based on estimating the size of a "causal
effect". I am fully aware of the problems to estimate causal effects with
regression models (and related techniques). But besides all these critiques,
I think that the framework is without alternatives as a guiding idea for the
model building strategy.

I hope I didn't dwell to much on the obvious.

Many regards


Ulrich Kohler
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index