Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: About taking log on zero values

From   "Schaffer, Mark E" <>
To   "" <>
Subject   RE: st: About taking log on zero values
Date   Thu, 20 Feb 2014 13:53:44 +0000

This is mostly on-topic (I think)... what do you all think of using the inverse hyperbolic sine transformation,

ihs(y) = log( y + sqrt(y^2+1) )

as a way of dealing with the log-of-zero issue?  Defined at zero, and approximately log-ish except at small y.

Nice little blog discussion here by Frances Woolley and commenters:

And the references cited in her blog entry:

John B. Burbidge, Lonnie Magee and A. Leslie Robb, 1988. "Alternative Transformations to Handle Extreme Values of the Dependent Variable" Journal of the American Statistical Association Vol. 83, No. 401, pp. 123-127 

MacKinnon, James G & Magee, Lonnie, 1990. "Transforming the Dependent Variable in Regression Models," International Economic Review, vol. 31(2), pages 315-39, May. 

Pence, Karen M. 2006. "The Role of Wealth Transformations: An Application to Estimating the Effect of Tax Incentives on Saving," Contributions to Economic Analysis & Policy: Vol. 5: Iss. 1, Article 20. Available at:


> -----Original Message-----
> From: [mailto:owner-
>] On Behalf Of Maarten Buis
> Sent: 20 February 2014 12:46
> To:
> Subject: Re: st: About taking log on zero values
> On Thu, Feb 20, 2014 at 1:16 PM, Austin Nichols wrote:
> >  Whether sales=0 means
> > "literally nothing" or "so small that it could not be detected"
> > you can't do any of the things suggested without introducing bias.
> Austin would have been right if the missing values were "true" missing
> values. For that case there is a literature that shows that replacing missing
> values with some constant (e.g. the mean) and adding an indicator variable
> for missingness leads to biased estimates, e.g.
> (Jones 1996) or (Allison 2002). I also commented on that before, e.g.
> <>. I suspect
> that Austin is basing his statement on that idea.
> However, that idea does not apply here as the missing values aren't "true"
> missing values: we know exactly how many sales happend to the units with a
> missing value on log(sales). My solution just imposes a particular functional
> form on the relationship between sales and the dependent variable with a
> discrete jump at 0. If that is a reasonable model for the data then no bias
> occurs, if it is not reasonable then there is a problem, but that is trivially true
> for any functional form.
> Hope this helps,
> Maarten
> Allison, P.D. (2002) Missing data. Thousand Oaks: Sage.
> Jones, M.P. (1996) Indicator and stratification methods for missing
> explanatory variables in multiple linear regression. Journal of the American
> Statistical Association, 91, 222-230.
> ---------------------------------
> Maarten L. Buis
> Reichpietschufer 50
> 10785 Berlin
> Germany
> ---------------------------------
> *
> *   For searches and help try:
> *
> *
> *

Sunday Times Scottish University of the Year 2011-2013
Top in the UK for student experience
Fourth university in the UK and top in Scotland (National Student Survey 2012)

We invite research leaders and ambitious early career researchers to 
join us in leading and driving research in key inter-disciplinary themes. 
Please see for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index