Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: log transformation question


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: log transformation question
Date   Fri, 18 Jun 2004 11:34:31 +0100

You need to check that -growth- has a fairly smooth 
skewed distribution with one shorter left tail 
and a longer right tail. If this were so, 
-log(growth + 100)- should be 
nearly symmetric and so have might some practical justification. 

But if I were a paper reviewer or a thesis examiner I would 
want to hear that case spelled out step by step. I would still 
be worried about replacing what looks like a natural origin 
with a fudged one. 

As John says, the second approach is very difficult 
to justify. In fact, he understated the case against 
it, as 0 is emphatically not the lowest possible 
logarithm! At most, if negative values are judged
to be in some sense mistaken or irrelevant then 
they should be replaced by missing values, not 
zeros.

But, most of all, it is not obvious that you absolutely 
need to transform -growth- at all. 

I have often taught how useful transformations can 
be and repeatedly emphasised how logs can make 
your life easier. I then find students, understandably, 
worried what to do when faced with air temperature 
variables which are skewed but with negative values. 
Here the situation is less clear-cut: a problem 
of negative values with the originals in Fahrenheit is 
sometimes "solved" by shifting to Celsius and it 
will always be "solved" by shifting to Kelvin, but 
in the first case the base remains arbitrary. However, 
my answer -- bearing in mind also various physical 
considerations -- is usually that the variable is 
often best left as is, even if highly skewed. 

You don't say what kind of growth you're dealing 
with but, whatever it is, zero is surely a natural 
origin. 

Nick 
n.j.cox@durham.ac.uk 

Wallace, John
 
> I think its usually a mistake to throw data away.  I'd be in 
> favour of the
> first approach, as you can do your log transformations, play 
> with models,
> etc and then project the results back onto your original 
> number line by
> reversing the math.  The second case would only make sense to 
> apply if the
> negative values were the result of goofy arithmetic where 
> negative values
> wouldn't result in reality (negative brightness, or mass for 
> instance).
> As long as negative growth makes sense (you aren't starting with a
> population of zero, for example) then its perfectly 
> reasonable to add an
> offset to make logarithmic math work...just keep track of 
> what the offset
> is.
> I'll leave the stat questions for the statisticians to answer!

mbarreto@uci.edu 
 
> i am transforming a bunch of variables into their natural logs, and i 
> have read conflicting advice on how to treat the negative 
> values, such 
> as growth, which ranges from -99 to +300 in my dataset.
> 
> one website suggests i just add +100 to the variable and then log it
> 
> gen log_growth = ln(growth+100)
> 
> a second website i visited suggests turning all negative values into 0
> 
> gen log_growth = ln(growth)
> (75 missing values generated)
> 
> recode log_growth .=0

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index