Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Re: Generating predicted values for OLS with transformed dependent variables

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: Re: Generating predicted values for OLS with transformed dependent variables
Date   Thu, 13 Apr 2006 12:00:10 +0100

Phil Schumm's already answered this, I think, but 
let me add further comments. 

The idea of a proper adjustment is chimerical 
here. From what perspective? If you think 
either model is correct, the other is 

Otherwise put, different models are involved, 
so you should not expect identical predictions. 
Naturally, if they happen to be close, so
much the better. 

The splendid name apart, I think Box-Cox has been rather 
oversold. (Very strangely, their references as I recall 
do not include the earlier Tukey _Annals of Mathematical
Statistics_ 1957 paper which was a key forerunner.) 
The idea of using maximum likelihood to choose from 
a family of transformations is indeed a big deal, but
not a very big deal. In their own paper, the authors
end up using logs in one example and reciprocals in 
the other, which is just what good data analysts would 
have done. If Box-Cox tells you the power should be 0.1, 
most statistically-minded people I know take that 
as a signal to use logarithms. So, the main idea
to me is, in Tukeyish terms, that of a ladder of 
transformations, and the ML machinery is secondary. 

Box-Cox mostly seems to appeal to those terrified of 
appearing "subjective" or "arbitrary" to advisors and reviewers, 
and frightened of using their judgement based on experience
and theory. I guess that if you have no experience 
or theory to call upon the appeal will be substantial. 
It's the same kind of issue as that facing those who will not 
make the tiniest step without the sanction of a P-value. 
(P = permission to proceed?) 

[email protected] 

Daniel Schneider
> Thanks for all the useful comments.
> Just to clarify the issue: For example, the predictions based on
> log(E[price]) = XG with GLM should be identical to the predictions
> generated from E[log(price)] = XB    (fit by -regress-, generating
> B_hat), when the later are adjusted properly?
> What would you suggest for predictions based on a box-cox (left-hand
> side) transformation? A two step procedure, first estimating 
> the box-cox
> transformation parameter and then using that parameter in a GLM to
> generate predicted variables? 
Phil Schumm
> > To expand on Nick's suggestion, one of the primary features of the  
> > GLM approach (as opposed to modeling a transformed variable) is to  
> > obtain predictions on the raw (i.e., untransformed) scale.  
> > So GLM is  
> > absolutely an important alternative to consider if this is a  
> > requirement.
> > 
> > The reason your results are different is that you've fit two  
> > different models.  They are:
> > 
> > E[log(price)] = XB    (fit by -regress-, generating B_hat)
> > 
> > and
> > 
> > log(E[price]) = XG    (fit by -glm-)
> > 
> > One can show that under certain conditions, you can consistently  
> > estimate G by B_hat (except for the intercept), but if those  
> > conditions aren't met, B_hat will be estimating something 
> > different.   
> > Naively assuming that B_hat estimates G is a common mistake people  
> > make when interpreting the results of a regression on a 
> transformed  
> > variable.
> > 
> > The documentation on -glm- in [R] is a good start, but if you're  
> > using this for anything important, I'd strongly suggest 
> picking up a  
> > copy of Generalized Linear Models (by McCullagh and Nelder), in  
> > particular the chapters "An outline of generalized linear 
> > models" and  
> > "Models with constant coefficient of variation".

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index