[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Log Transform Justification

From   "Nick Cox" <>
To   <>
Subject   Re: st: Log Transform Justification
Date   Thu, 30 Aug 2007 17:58:19 +0100

Austin Nichols has already picked up the main
point, about logs and logits. 

I have two further comments. 

1. I find it helpful to keep straight 
the distinction between _transformations_
and _link functions_, the latter 
jargon particular to generalised linear models
literature but not exclusive to it. For 
example, a classic logit model with binary 
outcomes 0 and 1 does not transform the 
response, nor could it do so, as logit 0 and logit 1 are both 
indeterminate. Rather the point is that 
the mean response is modelled to vary between 0 and 
1 (but can assume neither of those limits). 
More generally, in -glm- the link function
is not strictly a transformation. 

2. I don't have any general recipe for responses 
on (0,1), [0,1], (0,1] or [0,1) any more than
I do for responses on any other supports (apart
from always plot your data!). As 
usual there is a range of models with varying
assumptions and some experience and some prejudices
about how they work under departures from the 

As a joint author of -betafit- I have some
affection for beta models, but affection gets
you nowhere in this field. It is explicit that 
-betafit- ignores exact 0s and 1s and so it 
should be obvious that it is quite inappropriate
whenever they occur. Other procedures do appear
to work better in those circumstances. 

The trickiest circumstance appears to be whenever
there is a spike at 0 or 1 or both in the distribution. 
For example, if the response is fraction of assets
held in savings accounts, then presumably lots of 
people have no savings accounts and will score exact zeros
if they are part of the sample. This comes up repeatedly 
on the list. It is fascinating to observe the range of 
attitudes, including those who appear to assume that there 
must a transformation that will somehow fix this, those who say, 
"Just leave them out", and those who are convinced
that the answer must be a two-part model. Either way, 
seeking a panacea is not a good idea. The science
of what you are doing has to have the first call. 


Clive Nicholas
> Nick Cox wrote:
> > Some confusion here between logarithms and logits?
> If I'm thinking straight, you're arguing that the call to -glm,
> link(logit)- only makes sense for models whose dependent variables are
> already scaled 0-1, since the -link()- option does the transformation.
> That's certainly what you suggested to me here:
> I feel a Statalist sequel coming on. I've just finished re-fitting a
> batch of fractional logit models to voting intention data after
> discovering that I had log-transformed the dependent variables when it
> wasn't necessary, largely due to the re-reading of the above post!
> If this is so, which is the most appropriate Stata routine with which
> to fit an LT-OLS regression model? Note that not everybody in my field
> thinks this to be a good idea, anyway; indeed Paolino's (2001)
> extensive Monte Carlo tests found that such models come off third best
> against pure OLS and beta-distributed models in terms of bias,
> efficiency and 'overconfidence', and across a range of distributions
> to boot. It was this paper that encouraged me to move away from such
> an approach.
> Paolino P (2001) "Maximum Likelihood Estimation of Models with
> Beta-Distributed Dependent Variables", Political Analysis 9(4):
> 325-46.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index