Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: proportion of explained variance with log-transformed outcome

From   "Nick Cox" <>
To   <>
Subject   st: RE: proportion of explained variance with log-transformed outcome
Date   Tue, 18 Nov 2003 16:01:30 -0000

There's a bundle of issues here. To avoid 
some of them, I'll focus on the idea of R-sq as 

corr(response, predicted response)^2


corr(log response, predicted log response)^2 

I presume that's what you're calling 
"proportion of explained variance", 
a term many dislike, for often-rehearsed reasons. 

Even with that focus, this is partial. 

1 Numerical 

It's possible that these two measures are 
fairly close. Presumably that's most likely 
if corr(y, log y) is very near 1 -- in which 
case there is little point in a log transformation. 
The implication is that y is measured over so 
small a relative range that the curvature of the log 
function can be neglected. 

I would never count on them being close. But 
typically it's very easy to get both measures
and compare. 

2 Scientific 

In perhaps a minority of situations, one decides that 
a logarithmic scale is as or more convenient -- even 
more natural -- as the raw scale, in which case scientifically 
(practically, sociologically, whatever) 
one is as happy working on a logarithmic scale 
as on the original. Hackneyed but genuine examples 
are pH and decibels. If the statistics also says "logarithmic 
scales are better, because then model assumptions are more 
nearly correct", then everything marches together. 

This is not an absolute distinction. It seems that
economists can flip quite easily between thinking 
about income and thinking about log income, especially 
with practice. Both scales make enormous sense. 
I don't know if "log systolic blood pressure" 
ever seems quite natural in the same way, even if 
the log transformation appeared sensible on statistical 

Clearly, there can be some tension between the 
scientific ego and the statistical id (or is it
the other way round?) if the scientist (scientific 
part of the researcher) wants to think in terms of 
the original scales (and presumably measurement must have been lousy 
if measured scales are thought dispensable). 

3 Statistical

As often mentioned on this list, one signal 
merit of generalised linear models is that
they purport to give you the best of both 
worlds, that you do the calculations 
on a transformed scale -- by courtesy of 
a link function -- but get results on the response scale. 
Perhaps that's something to check out. 


Buzz Burhans
> If one estimates the proportion of explained variance for a 
> model using a 
> log transformed variable, is that proportion of explained variance 
> approximately applicable to the untransformed variable ? In other 
> words,  if I derive the proportion of explained variance of 
> a dependant 
> variable in a logtransformed model associated with a 
> predictor variable, 
> does that variable also explain a similar proportion  of 
> the variance (not 
> necessarily exactly the same) in the untransformed raw 
> metric? I appreciate 
> that the variance itself in the two metrics is different, 
> but is the 
> proportion of explained variance similar?
> I have a model in which a treatment effect is significant, 
> but explains 
> little of the total variance.  The model is run on 
> transformed variables 
> (log  transformed outcome, and a fractional polynomial 
> dependant time 
> variable). Interpretation in practical terms should speak 
> to this issue of 
> minimal albeit significant treatment effect relative to 
> contribution to the 
> total variance, but I am not sure how to express this, or 
> even if I can 
> make any statement about it relative to the original raw 
> metric since I 
> modeled in the transformed metric.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index