[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: log transformation question |

Date |
Fri, 18 Jun 2004 11:34:31 +0100 |

You need to check that -growth- has a fairly smooth skewed distribution with one shorter left tail and a longer right tail. If this were so, -log(growth + 100)- should be nearly symmetric and so have might some practical justification. But if I were a paper reviewer or a thesis examiner I would want to hear that case spelled out step by step. I would still be worried about replacing what looks like a natural origin with a fudged one. As John says, the second approach is very difficult to justify. In fact, he understated the case against it, as 0 is emphatically not the lowest possible logarithm! At most, if negative values are judged to be in some sense mistaken or irrelevant then they should be replaced by missing values, not zeros. But, most of all, it is not obvious that you absolutely need to transform -growth- at all. I have often taught how useful transformations can be and repeatedly emphasised how logs can make your life easier. I then find students, understandably, worried what to do when faced with air temperature variables which are skewed but with negative values. Here the situation is less clear-cut: a problem of negative values with the originals in Fahrenheit is sometimes "solved" by shifting to Celsius and it will always be "solved" by shifting to Kelvin, but in the first case the base remains arbitrary. However, my answer -- bearing in mind also various physical considerations -- is usually that the variable is often best left as is, even if highly skewed. You don't say what kind of growth you're dealing with but, whatever it is, zero is surely a natural origin. Nick n.j.cox@durham.ac.uk Wallace, John > I think its usually a mistake to throw data away. I'd be in > favour of the > first approach, as you can do your log transformations, play > with models, > etc and then project the results back onto your original > number line by > reversing the math. The second case would only make sense to > apply if the > negative values were the result of goofy arithmetic where > negative values > wouldn't result in reality (negative brightness, or mass for > instance). > As long as negative growth makes sense (you aren't starting with a > population of zero, for example) then its perfectly > reasonable to add an > offset to make logarithmic math work...just keep track of > what the offset > is. > I'll leave the stat questions for the statisticians to answer! mbarreto@uci.edu > i am transforming a bunch of variables into their natural logs, and i > have read conflicting advice on how to treat the negative > values, such > as growth, which ranges from -99 to +300 in my dataset. > > one website suggests i just add +100 to the variable and then log it > > gen log_growth = ln(growth+100) > > a second website i visited suggests turning all negative values into 0 > > gen log_growth = ln(growth) > (75 missing values generated) > > recode log_growth .=0 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: probit** - Next by Date:
**st: RE: Graphing question** - Previous by thread:
**st: RE: Multinomial Logit vs. Regression with dummy** - Next by thread:
**st: RE: Graphing question** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |