Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: About taking log on zero values

 From Alfonso Sanchez-Penalver To "statalist@hsphsun2.harvard.edu" Subject Re: st: About taking log on zero values Date Thu, 20 Feb 2014 13:55:23 -0500

```I don't agree with that. There are two reasons why sales can take the value of zero:

1. Because it's an actual zero
2. Because sales cannot be negative and thus the variable is censored at zero.

As long as there's one observation with a zero value that is not a true zero the real relationship between the dependent variable y and ln(sales) is broken, and that's what needs to be fixed. Your methodology only computes the average effect of the observations for which sales is zero, while not accounting for the true variation in the means of the log of sales in the observations where sales equals zero. A Heckman or Tobit intermediate estimation of the ln(sales) will replace the values of ln(sales) with the predicted ones, and thus account for at least the explained variation across those observations, thus producing a better estimate of the coefficient on ln(sales) than your methodology.

Best,

Alfonso Sanchez-Penalver

> On Feb 20, 2014, at 11:26 AM, Maarten Buis <maartenlbuis@gmail.com> wrote:
>
>> On Thu, Feb 20, 2014 at 5:10 PM, Alfonso Sanchez-Penalver wrote:
>>
>> The Tobit or Heckman would be intermediate estimations to estimate the ln(sales) for the values where sales equals zero. You would then use replace ln(sales) with the predicted values of the Tobit of Heckman models for those values where sales equals zero, and use that variable as an explanatory variable in the model he wants to estimate. This is common practice when you need to use a wage rate in a model and you only have rates for those people who work.
>
> I understood that you meant something like that and it sounds wrong to
> me: the value log(0) should take in the model of interest is not a
> characteristic of sales alone but of the form of the relationship
> between sales and the dependent variable y. Tobit or Heckman models
> are not tools for determining that shape. In case of a wage as an
> independent variable I would definately prefer to use log(wage), set
> to min(wage) when not working, combined with an indicator variable for
> not-working, rather then trying to force these qualitatively different
> states into one variable.
>
> -- Maarten
>
> ---------------------------------
> Maarten L. Buis
> WZB
> Reichpietschufer 50
> 10785 Berlin
> Germany
>
> http://www.maartenbuis.nl
> ---------------------------------
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```