Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: About taking log on zero values


From   Alfonso Sánchez-Peñalver <alfonso.statalist@gmail.com>
To   Stata List <statalist@hsphsun2.harvard.edu>
Subject   Re: st: About taking log on zero values
Date   Thu, 20 Feb 2014 10:16:56 -0500

Hi again,

I know that my previous suggestion about adding a small value to sales may have not been a great idea, but I have seen it done in several occasions. In any case, if possible the best possible solution would be to estimate the values that ln(sales) would take for those zeros using either a Tobit or a Heckman sample selection model. This depends if you have variables available that could be used as explanatory variables for these models. Chapter 16 in

Cameron, A. Colin, and Pravin K. Trivedi (2010), Microeconometrics Using Stata, Revised Edition, Stata Press, College Station, TX USA (http://www.stata-press.com/books/microeconometrics-stata/)

provides great examples and explanation of how to estimate either of these two models using log transformation of the response variable. Since you’re going to use ln(sales) as your explanatory variable in the model of interest you only need to predict ln(sales) for those observations where sales == 0.

Alfonso Sánchez-Peñalver, PhD

Visiting Assistant Professor
Suffolk University
Senior Instructor
UMass Boston



On Feb 20, 2014, at 8:53 AM, Schaffer, Mark E <M.E.Schaffer@hw.ac.uk> wrote:

> This is mostly on-topic (I think)... what do you all think of using the inverse hyperbolic sine transformation,
> 
> ihs(y) = log( y + sqrt(y^2+1) )
> 
> as a way of dealing with the log-of-zero issue?  Defined at zero, and approximately log-ish except at small y.
> 
> Nice little blog discussion here by Frances Woolley and commenters:
> 
> http://worthwhile.typepad.com/worthwhile_canadian_initi/2011/07/a-rant-on-inverse-hyperbolic-sine-transformations.html
> 
> And the references cited in her blog entry:
> 
> John B. Burbidge, Lonnie Magee and A. Leslie Robb, 1988. "Alternative Transformations to Handle Extreme Values of the Dependent Variable" Journal of the American Statistical Association Vol. 83, No. 401, pp. 123-127 
> 
> MacKinnon, James G & Magee, Lonnie, 1990. "Transforming the Dependent Variable in Regression Models," International Economic Review, vol. 31(2), pages 315-39, May. 
> 
> Pence, Karen M. 2006. "The Role of Wealth Transformations: An Application to Estimating the Effect of Tax Incentives on Saving," Contributions to Economic Analysis & Policy: Vol. 5: Iss. 1, Article 20. Available at: http://www.bepress.com/bejeap/contributions/vol5/iss1/art20
> 
> --Mark
> 
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> statalist@hsphsun2.harvard.edu] On Behalf Of Maarten Buis
>> Sent: 20 February 2014 12:46
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: About taking log on zero values
>> 
>> On Thu, Feb 20, 2014 at 1:16 PM, Austin Nichols wrote:
>>> Whether sales=0 means
>>> "literally nothing" or "so small that it could not be detected"
>>> you can't do any of the things suggested without introducing bias.
>> 
>> Austin would have been right if the missing values were "true" missing
>> values. For that case there is a literature that shows that replacing missing
>> values with some constant (e.g. the mean) and adding an indicator variable
>> for missingness leads to biased estimates, e.g.
>> (Jones 1996) or (Allison 2002). I also commented on that before, e.g.
>> <http://www.stata.com/statalist/archive/2007-12/msg00030.html>. I suspect
>> that Austin is basing his statement on that idea.
>> 
>> However, that idea does not apply here as the missing values aren't "true"
>> missing values: we know exactly how many sales happend to the units with a
>> missing value on log(sales). My solution just imposes a particular functional
>> form on the relationship between sales and the dependent variable with a
>> discrete jump at 0. If that is a reasonable model for the data then no bias
>> occurs, if it is not reasonable then there is a problem, but that is trivially true
>> for any functional form.
>> 
>> Hope this helps,
>> Maarten
>> 
>> Allison, P.D. (2002) Missing data. Thousand Oaks: Sage.
>> 
>> Jones, M.P. (1996) Indicator and stratification methods for missing
>> explanatory variables in multiple linear regression. Journal of the American
>> Statistical Association, 91, 222-230.
>> 
>> ---------------------------------
>> Maarten L. Buis
>> WZB
>> Reichpietschufer 50
>> 10785 Berlin
>> Germany
>> 
>> http://www.maartenbuis.nl
>> ---------------------------------
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> ----- 
> Sunday Times Scottish University of the Year 2011-2013
> Top in the UK for student experience
> Fourth university in the UK and top in Scotland (National Student Survey 2012)
> 
> 
> We invite research leaders and ambitious early career researchers to 
> join us in leading and driving research in key inter-disciplinary themes. 
> Please see www.hw.ac.uk/researchleaders for further information and how
> to apply.
> 
> Heriot-Watt University is a Scottish charity
> registered under charity number SC000278.
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index