Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: About taking log on zero values

From   Maarten Buis <>
Subject   Re: st: About taking log on zero values
Date   Thu, 20 Feb 2014 13:45:32 +0100

On Thu, Feb 20, 2014 at 1:16 PM, Austin Nichols wrote:
>  Whether sales=0 means
> "literally nothing" or "so small that it could not be detected"
> you can't do any of the things suggested without introducing bias.

Austin would have been right if the missing values were "true" missing
values. For that case there is a literature that shows that replacing
missing values with some constant (e.g. the mean) and adding an
indicator variable for missingness leads to biased estimates, e.g.
(Jones 1996) or (Allison 2002). I also commented on that before, e.g.
<>. I
suspect that Austin is basing his statement on that idea.

However, that idea does not apply here as the missing values aren't
"true" missing values: we know exactly how many sales happend to the
units with a missing value on log(sales). My solution just imposes a
particular functional form on the relationship between sales and the
dependent variable with a discrete jump at 0. If that is a reasonable
model for the data then no bias occurs, if it is not reasonable then
there is a problem, but that is trivially true for any functional

Hope this helps,

Allison, P.D. (2002) Missing data. Thousand Oaks: Sage.

Jones, M.P. (1996) Indicator and stratification methods for missing
explanatory variables in multiple linear regression. Journal of the
American Statistical Association, 91, 222-230.

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index