In re: adding alpha to X to make ln(X) nonmissing
Why does this operation come up so often, when it is so often a bad
idea? I have seen several papers this week that add some constant to
X so that ln(X) can be regressed on some variables, or some variable
can be regressed on it. Wouldn't you be just as well off imputing
2*atan(X)-2*atan(1) or somesuch? Is there a well-known good reference
on this subject?
Just now, when looking up the ref for an adjacent thread on btscs.ado,
I ran across Oneal & Russett (2001) which acknowledges that Beck,
Katz, and Tucker (1998) pointed out an error, and then replies to
another critique with this (p.480):
"
Before taking the logarithm [of trade volume in $millions] we assigned
a different value to the trade variable for dyads that report no
trade. Some value must be imputed because the logarithm of zero is
undefined. We use $100,000 [so really it was ln(0.1)]; Green, Kim,
and Yoon used $1. It is this that accounts for most of the
differences between our results and theirs.
"
Oneal, John R. and Bruce Russett. 2001. Clear and Clean: The Fixed
Effects of the Liberal Peace. International Organization, Vol. 55, No.
2. (Spring, 2001), pp. 469-485.
http://links.jstor.org/sici?sici=0020-8183%28200121%2955%3A2%3C469%3ACACTFE%3E2.0.CO%3B2-A
Green, Donald P., Soo Yeon Kim, and David H. Yoon. 2001. "Dirty Pool."
International Organization, Vol. 55, No. 2. (Spring, 2001), pp.
441-468.
http://links.jstor.org/sici?sici=0020-8183%28200121%2955%3A2%3C441%3ADP%3E2.0.CO%3B2-N
Beck, Nathaniel, Jonathan N. Katz and Richard Tucker. 1998. Taking
Time Seriously: Time-Series-Cross-Section Analysis with a Binary
Dependent Variable. American Journal of Political Science, 42:
1260-1288.
See also:
ssc install transint
h transint
http://www.stata.com/statalist/archive/2006-11/msg00294.html
Nick Cox <n.j.cox@durham.ac.uk> wrote:
I think the objection to that is that it is
dimensionally unbalanced. That is, X2 and
whatever is added to it should have the same units
and the same dimensions. (Perhaps economists don't
care about these things, but my wannabe physicist
persona does.)
Rodrigo A. Alfaro wrote:
> following Maarten suggestion:
> lnY=B0+B1*lnX1+B2*ln (X2+exp(alpha))+epsilon??
>
> Maarten buis wrote:
>
> > However, Nick just explained that you do not need to do that, and I
> > agree. Adding some constant to a variable so that the log doesn't
> > become zero is making an error, maybe or maybe not a necessary error
> > but still an error, why do you expect your data to be able to inform
> > you about an error?
Nick Cox <n.j.cox@durham.ac.uk> wrote:
I would forget about the constraint. If your specification
is sensible a positive value for alpha will emerge from the
estimation. If it doesn't you have a signal that the apecification
is suspect in that regard.
Alternatively, just try log(x + 1). The extra degree of freedom
might come in handy. I used to think log(x + 1) was a fudge but
I now regard it more fondly. It's a function that goes to 0 as x goes
to 0 from above and it behaves like log x as x gets very large,
so it is fairly well motivated.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/