Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: About taking log on zero values


From   Austin Nichols <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: About taking log on zero values
Date   Sat, 1 Mar 2014 09:24:52 -0500

Maarten--

There is one piece of advice left out of this summary of the
thread, which is: don't do that.

I.e. leave the regression as is. Under some assumptions about
what is driving x to be zero and lnx to be undefined, for
example that x is below some threshold such as 0.5 and is
censored, perhaps due to privacy concerns, then leaving out
those cases is selection on observable x and Stata takes
exactly the right approach when those cases are dropped.
Adding those observations back in with a dummy for missing lnx
will tend to underestimate the variability of estimates of the
semielasticity because it conflates the error variance in the
two parts of that model, which can lead to biased inference,
as in the toy simulation below.  If the x=0 cases are
true zeros, then the semielasticity is undefined for those cases
and again Stata takes exactly the right approach dropping those
cases.  If there is a lower detection limit and measurement
error in x, and the potential for that measurement error to be
correlated with other variables in the model, all bets are off.
One might want to impute x in that case, but it depends.
If there is the potential for specification error e.g. the
semielasticity is not constant across x, then there is no
guarantee that any method will recover the true mean
semielasticity, and more structure needs to be put on the
problem to find the best solution.

set seed 1
prog drop _all
prog msim, rclass
drawnorm e, n(1000) clear
g c=rnormal()
g x=exp(c+rnormal())
g y=ln(x)/5+c+e
g lnx=ln(x)
reg y lnx c
ret scalar b=_b[lnx]
test lnx=.2
ret scalar p=r(p)
replace lnx=. if x<.5
reg y lnx c
ret scalar tb=_b[lnx]
test lnx=.2
ret scalar tp=r(p)
g milnx=mi(lnx)
su lnx, mean
replace lnx=r(min) if mi(lnx)
reg y lnx milnx c
ret scalar cb=_b[lnx]
test lnx=.2
ret scalar cp=r(p)
eret clear
end
simul,r(10000):msim
assert tb==cb
g rejo=p<.1
g rejt=tp<.1
g rejc=cp<.1
su rej*


On Fri, Feb 21, 2014 at 8:17 AM, Maarten Buis <[email protected]> wrote:
> On Fri, Feb 21, 2014 at 1:53 PM, jose maria pacheco de souza  wrote:
>> It would be very useful if someone  could make a organized summary of the
>> sugestions.
>
> Nick started out with a well organized and large set of sugestions and
> considerations <http://www.stata.com/statalist/archive/2014-02/msg00790.html>
>
> To that I added the option of creating an indicator variable for x=0,
> set x at the smallest non-zero value, take the log of that x and add
> both the log and the indicator variable.
> <http://www.stata.com/statalist/archive/2014-02/msg00826.html> and
> that idea was independetly repeated by Daniel Feenberg
> <http://www.stata.com/statalist/archive/2014-02/msg00874.html>.
>
> Alternative transformations that don't have a problem with x=0 were
> proposed by Mark Schaffer
> <http://www.stata.com/statalist/archive/2014-02/msg00837.html> and
> Nick Cox <http://www.stata.com/statalist/archive/2014-02/msg00848.html>.
>
> Alfonso Sánchez-Peñalver suggested using Tobit or Heckman models to
> predict alternative values for the 0s here:
> <http://www.stata.com/statalist/archive/2014-02/msg00845.html>.
>
> Hope this helps,
> Maarten

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index