Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: About taking log on zero values

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: About taking log on zero values Date Wed, 19 Feb 2014 20:18:21 +0000

```Actually the worst thing you can do!

People often get this the wrong way round, and forget what they know

However, a very small non-zero value implies a very big negative
logarithm. Far from being a conservative change, it is a radical one.
You _create_ one or more outliers that will dominate any later
analysis.

For convenience assume -log10()-. Now

log10(a millionth) = -6
log10(a billionth) = -9

etc.

So, while arbitrarily small fractions are, agreed, arbitrarily close
to zero, they on a logarithmic scale are arbitrarily far from log(1) =
0, which is what is important once you choose a logarithmic scale.

Nick
njcoxstata@gmail.com

On 19 February 2014 19:56, Alfonso Sanchez-Penalver
<alfonso.statalist@gmail.com> wrote:
> Seb,
>
> Stata would interpret the "." as a missing value and thus drop the observation from the estimation. You would thus only be regressing the observations with positive values of the original variable. A simple trick to not lose any observations is to add a very small constant (say 0.00000001) to those zero values before taking logs. That would keep all observations. I'm sure this will have many retractors too.
>
> In your case entering the log of sales as an explanatory variable I guess is to capture nonlinearities in the relationship? If that's the case, to avoid the problem with the zeros, have you thought of entering a quadratic relationship with sales instead of a linear one?
>
> Best,
>
> Alfonso Sanchez-Penalver
>
>> On Feb 19, 2014, at 2:44 PM, Sebastian Say <sebastian.statalist@gmail.com> wrote:
>>
>> Hi my question is about how stata treats a log-transformed variable
>> that draws upon an original variable that contains zero.
>>
>> In my dataset, i have firm sales data but some of them have zero. I
>> created a logsales variable and noticed that those with zeros are
>> indicated as a "."
>>
>> I plan to run a regression, e.g.
>>
>> reg y x1 x2 logsales
>>
>> My question is, how would stata treat these "." if I do not remove them?
>>
>> Technically the "." should be undefined.
>>
>> I've read some papers and they usually put a 1 for those sales data
>> with zeros in them. Is this a usual practice?
>>
>> Thank you very much.
>>
>> Seb
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```