Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: adjusted r square


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: adjusted r square
Date   Tue, 20 Feb 2007 22:58:39 -0000

Uli raises some good questions, which have often been answered
in the literature -- but the answers disagree. 

Some of the concerns here are pointed up by considering perfect models
that interpolate the data. The idea of a 1:1 map appears as a 
literary conceit in Lewis Carroll, Jorge Luis Borges, Umberto 
Eco, etc. It underlines that the perfect model is just as difficult to 
understand as the original data. Worse, it is less likely than a 
simpler model to transfer well to other datasets. A overfitted model 
pays too much attention to quirks of the dataset by "capitalising 
on chance". 

What is immediate is that modelling is a trade-off problem 
between goodness of fit and parsimony or simplicity. However, people 
can not agree on how to quantify the trade-off. It is far from self-evident
even that the number of adjustable parameters is the best metric for complexity. 
Adjusting R-square I think goes back further than other criteria such as AIC and 
BIC, but each criterion proposed has its few years of fame before another 
becomes more fashionable. I regularly read advice such as that AIC is widely agreed 
to give the wrong answer, to which the reaction has to be, How do they 
know? 

Nick 
n.j.cox@durham.ac.uk 

Ulrich Kohler
 
> However, as an aside: I do not find the arguments for the 
> adjusted R2 very
> convincing. It is sometimes said that you have to be punished 
> for including
> additional variables in a model. But why? Because the R2 
> increases? Why do I
> need to be punished for this? It is just a simple fact that I 
> can explain
> more variance with an additional variable. Punishment and 
> especially the
> amount of punishment is pure metaphysics. The set of control 
> variables should
> be compiled on theoretical reasons alone. If your model contains some
> variables that should be excluded on theortical reasons, 
> exclude them (or you
> will get punished by your reviewer). Likewise, if your model 
> does not include
> a variable in the model that should be there on theortical 
> reasons, include
> it (or your reviewer will punish you as well).
> 
> Needless to say that it might happen that your reviewer might 
> punish you for not using the adjusted R2. ;-)

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index