Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: About taking log on zero values

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: About taking log on zero values
Date	Thu, 20 Feb 2014 15:45:47 +0000

This is asinh() in Stata. The main problem in my view is a marketing
one. For many groups this is an exotic beast they have never heard of
or have never used since a brief encounter years ago. Imagine the
puzzlement in a presentation.

It's not an equivalent, but I am fond of cube roots as the simplest
(meaning most familiar) function that copes with zeros as well as
positive values (not to mention negatives). I can vouch that you may
have to struggle to get it past reviewers.

This is all a problem because there are so many reasons to use logs
because of their appearance in so many nonlinear relationships,
including those easily linearizable. Anything else seems ad hoc (on a
fine day, to be translated as "fit for purpose"*).

Nick
[email protected]

*You heard it here first.


On 20 February 2014 13:53, Schaffer, Mark E <[email protected]> wrote:
> This is mostly on-topic (I think)... what do you all think of using the inverse hyperbolic sine transformation,
>
> ihs(y) = log( y + sqrt(y^2+1) )
>
> as a way of dealing with the log-of-zero issue?  Defined at zero, and approximately log-ish except at small y.
>
> Nice little blog discussion here by Frances Woolley and commenters:
>
> http://worthwhile.typepad.com/worthwhile_canadian_initi/2011/07/a-rant-on-inverse-hyperbolic-sine-transformations.html
>
> And the references cited in her blog entry:
>
> John B. Burbidge, Lonnie Magee and A. Leslie Robb, 1988. "Alternative Transformations to Handle Extreme Values of the Dependent Variable" Journal of the American Statistical Association Vol. 83, No. 401, pp. 123-127
>
> MacKinnon, James G & Magee, Lonnie, 1990. "Transforming the Dependent Variable in Regression Models," International Economic Review, vol. 31(2), pages 315-39, May.
>
> Pence, Karen M. 2006. "The Role of Wealth Transformations: An Application to Estimating the Effect of Tax Incentives on Saving," Contributions to Economic Analysis & Policy: Vol. 5: Iss. 1, Article 20. Available at: http://www.bepress.com/bejeap/contributions/vol5/iss1/art20
>
> --Mark
>
>> -----Original Message-----
>> From: [email protected] [mailto:owner-
>> [email protected]] On Behalf Of Maarten Buis
>> Sent: 20 February 2014 12:46
>> To: [email protected]
>> Subject: Re: st: About taking log on zero values
>>
>> On Thu, Feb 20, 2014 at 1:16 PM, Austin Nichols wrote:
>> >  Whether sales=0 means
>> > "literally nothing" or "so small that it could not be detected"
>> > you can't do any of the things suggested without introducing bias.
>>
>> Austin would have been right if the missing values were "true" missing
>> values. For that case there is a literature that shows that replacing missing
>> values with some constant (e.g. the mean) and adding an indicator variable
>> for missingness leads to biased estimates, e.g.
>> (Jones 1996) or (Allison 2002). I also commented on that before, e.g.
>> <http://www.stata.com/statalist/archive/2007-12/msg00030.html>. I suspect
>> that Austin is basing his statement on that idea.
>>
>> However, that idea does not apply here as the missing values aren't "true"
>> missing values: we know exactly how many sales happend to the units with a
>> missing value on log(sales). My solution just imposes a particular functional
>> form on the relationship between sales and the dependent variable with a
>> discrete jump at 0. If that is a reasonable model for the data then no bias
>> occurs, if it is not reasonable then there is a problem, but that is trivially true
>> for any functional form.
>>
>> Hope this helps,
>> Maarten
>>
>> Allison, P.D. (2002) Missing data. Thousand Oaks: Sage.
>>
>> Jones, M.P. (1996) Indicator and stratification methods for missing
>> explanatory variables in multiple linear regression. Journal of the American
>> Statistical Association, 91, 222-230.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: About taking log on zero values
  - From: Nick Cox <[email protected]>

References:
- st: About taking log on zero values
  - From: Sebastian Say <[email protected]>
- Re: st: About taking log on zero values
  - From: Nick Cox <[email protected]>
- Re: st: About taking log on zero values
  - From: Maarten Buis <[email protected]>
- Re: st: About taking log on zero values
  - From: Austin Nichols <[email protected]>
- Re: st: About taking log on zero values
  - From: Maarten Buis <[email protected]>
- RE: st: About taking log on zero values
  - From: "Schaffer, Mark E" <[email protected]>

Prev by Date: RE: st: insheet and dropping cases
Next by Date: Re: st: About taking log on zero values
Previous by thread: Re: st: About taking log on zero values
Next by thread: Re: st: About taking log on zero values
Index(es):
- Date
- Thread