Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: About taking log on zero values |

Date |
Wed, 19 Feb 2014 20:11:42 +0000 |

Stata would ignore numeric missings in anything like a regression calculation. That applies also to missings that result from calculating log(0). Changing values of 0 to values to 1 so that you can take logarithms is not something I would call "usual practice". It is, I suspect, regarded differently by different people on a spectrum from unethical and incorrect to an acceptable fudge, depending partly on the rest of the data and what you are doing with them. An incomplete list of things to think about: 0. If values of 1 occur otherwise, you have created an inconsistency. If values between 0 and 1 occur otherwise, you have created a bigger one. Applying log(x + 1) consistently solves this problem only by creating another. Applying log(x + 1) and pretending that it is really applying log(x) is not widely accepted. 1. If 0 really means what it says, changing it to 1 is a falsification. Whether you can put a spin on it as an acceptable or necessary falsification is an open question. 2. If 0 really means "small but not detected", changing it to e.g. half smallest observable value is sometimes an accepted or acceptable modification. 3. Replacing log(0) with log(1) is not, necessarily, even a small and conservative modification. If apart from the values of 0 values range from e3 to e6 then after logging you have 0 and otherwise a range of 3 to 6. You may have _created_ a bundle of outliers that will dominate analyses. 4. Doing something about 0s is only necessary with logarithmic transformation. If you have 0s in the response, you can leave them and use a logarithmic link. That won't necessarily be a good model, but using a logarithmic link doesn't require positive values in the response, only that the mean function be always positive. (This doesn't apply in your case as the variable in question is a predictor.) 5. There are usually alternatives, such as transformations other than logarithms. 6. I wouldn't do anything without considering some kind of sensitivity analysis, i.e. a consideration of how much difference an arbitrary treatment of zeros makes. 7. There is often an argument that implies that the observations with zeros don't belong any way. (I have generalised your question, but suspect that zero values for sales usually mean exactly what they say.) Nick njcoxstata@gmail.com On 19 February 2014 19:44, Sebastian Say <sebastian.statalist@gmail.com> wrote [edited] > My question is about how Stata treats a log-transformed variable > that draws upon an original variable that contains zero. > > In my dataset, I have firm sales data but some of them have values of zero. I > created a logsales variable and noticed that those with zeros are > indicated as a "." > > I plan to run a regression, e.g. > > reg y x1 x2 logsales > > My question is, how would Stata treat these "." if I do not remove them? > > Technically the "." should be undefined. > > I've read some papers and they usually put a 1 for those sales data > with zeros in them. Is this a usual practice? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: About taking log on zero values***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: About taking log on zero values***From:*Jeph Herrin <info@flyingbuttress.net>

**References**:**st: About taking log on zero values***From:*Sebastian Say <sebastian.statalist@gmail.com>

- Prev by Date:
**Re: st: About taking log on zero values** - Next by Date:
**Re: st: About taking log on zero values** - Previous by thread:
**Re: st: About taking log on zero values** - Next by thread:
**Re: st: About taking log on zero values** - Index(es):