Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: About taking log on zero values


From   Maarten Buis <[email protected]>
To   [email protected]
Subject   Re: st: About taking log on zero values
Date   Thu, 20 Feb 2014 22:29:14 +0100

On Thu, Feb 20, 2014 at 7:55 PM, Alfonso Sanchez-Penalver wrote:
> I don't agree with that. There are two reasons why sales can take the value of zero:
>
> 1. Because it's an actual zero
> 2. Because sales cannot be negative and thus the variable is censored at zero.
>
> As long as there's one observation with a zero value that is not a true zero the real relationship between the dependent variable y and ln(sales) is broken, and that's what needs to be fixed.

I am trying to think of an owner of a neighbourhood shop and I imagine
asking her or him if (s)he ever encountered scenario 2. I cannot
imagine any other response than a very very very blank stare (if (s)he
is polite). Either you sell something or you don't.

In other situations you might get a mixture of different meanings of
the value 0. Consider a measurement devise that measures a
concentration of a substance. Such devises are not infinitly precise,
and there will be a concentration below which it can no longer detect
the substance. So in such cases the value 0 could mean the substance
is totally absent (often unlikely) or the concentration is very very
low. However, I cannot imagine how in that case neither Heckman or
Tobit would be a solution.

> Your methodology only computes the average effect of the observations for which sales is zero, while not accounting for the true variation in the means of the log of sales in the observations where sales equals zero. A Heckman or Tobit intermediate estimation of the ln(sales) will replace the values of ln(sales) with the predicted ones, and thus account for at least the explained variation across those observations, thus producing a better estimate of the coefficient on ln(sales) than your methodology.

What you describe is a general measurement problem. If you are worried
about that, then you should do much much more than just look at 0s.
For that reason alone I would suspect that the Heckman or Tobit
solution would make the problem worse rather than better.

My rule of thumb is not to stack fragile methods like Heckman or Tobit
with other models. In all likelihood you make things a lot worse when
you do.

-- Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index