Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: About taking log on zero values


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: About taking log on zero values
Date   Wed, 19 Feb 2014 20:11:42 +0000

Stata would ignore numeric missings in anything like a regression calculation.

That applies also to missings that result from calculating log(0).

Changing values of 0 to values to 1 so that you can take logarithms is
not something I would call "usual practice". It is, I suspect,
regarded differently by different people on a spectrum from unethical
and incorrect to an acceptable fudge, depending partly on the rest of
the data and what you are doing with them.

An incomplete list of things to think about:

0. If values of 1 occur otherwise, you have created an inconsistency.
If values between 0 and 1 occur otherwise, you have created a bigger
one. Applying log(x + 1) consistently solves this problem only by
creating another. Applying log(x + 1) and pretending that it is really
applying log(x) is not widely accepted.

1. If 0 really means what it says, changing it to 1 is a
falsification. Whether you can put a spin on it as an acceptable or
necessary falsification is an open question.

2. If 0 really means "small but not detected", changing it to e.g.
half smallest observable value is sometimes an accepted or acceptable
modification.

3. Replacing log(0) with log(1) is not, necessarily, even a small and
conservative modification. If apart from the values of 0 values range
from e3 to e6 then after logging you have 0 and otherwise a range of 3
to 6. You may have _created_ a bundle of outliers that will dominate
analyses.

4. Doing something about 0s is only necessary with logarithmic
transformation. If you have 0s in the response, you can leave them and
use a logarithmic link. That won't necessarily be a good model, but
using a logarithmic link doesn't require positive values in the
response, only that the mean function be always positive. (This
doesn't apply in your case as the variable in question is a
predictor.)

5. There are usually alternatives, such as transformations other than
logarithms.

6. I wouldn't do anything without considering some kind of sensitivity
analysis, i.e. a consideration of how much difference an arbitrary
treatment of zeros makes.

7. There is often an argument that implies that the observations with
zeros don't belong any way.

(I have generalised your question, but suspect that zero values for
sales usually mean exactly what they say.)

Nick
njcoxstata@gmail.com

On 19 February 2014 19:44, Sebastian Say
<sebastian.statalist@gmail.com> wrote [edited]

> My question is about how Stata treats a log-transformed variable
> that draws upon an original variable that contains zero.
>
> In my dataset, I have firm sales data but some of them have values of zero. I
> created a logsales variable and noticed that those with zeros are
> indicated as a "."
>
> I plan to run a regression, e.g.
>
> reg y x1 x2 logsales
>
> My question is, how would Stata treat these "." if I do not remove them?
>
> Technically the "." should be undefined.
>
> I've read some papers and they usually put a 1 for those sales data
> with zeros in them. Is this a usual practice?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index