[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: lnskew question
Setting aside the k term, the retransformation problem is actually more
complex than the issue of whether to use a parametric or less parametric
retransformation factor. If log(y) = xb + e, with E(x'e) = 0, then the
expectation of y given x is
E(y|x) = [exp(xb)][E(exp(e))].
The last term is consistently estimated by the smearing coefficient if the
error e is homoscedastic in x. But if the error is heteroscedastic in some
of the covariates, then one needs to introduce the additional terms to
reflect the heteroscedasticity.
The easiest case to see is where e is normally distributed with log-scale
variance v(x). Then we get,
E(y|x) = exp(xb + 0.5 v(x))
If the error is not normal but is heteroscedastic by a categorical
variable, then smearing by subgroup will do.
If the results are heteroscedastic in a continuous variable or in several
variables, then one alternative is to use a GLM model with a log link --
possibly the gamma. As long as the specification of the link and the
covariates is correct, the GLM will be unbiased. If the family(for
example, the gamma) is incorrect, then the losses will be those of
precision. The specific family for the GLM depends on the relationship of
the mean and variance functions for the application.
Note that the GLM predictions can be retransformed by exponentiation
because the GLM is estimating the log(E(y|x)), not the E(log(y|x)) as least
squares on log y is.
For some discussion of this, see Blough et al in the Journal of Health
Economics (April 1999) for a case with the additional wrinkle of a large
zero mass. If there is no zero mass problem, Robert could consider the
second part of Blough's model.
At 09:56 AM 12/9/2002 +0000, you wrote:
> I've been tinkering with log-transforming an outcome variable for a
> regression, but I thought lnskew0 might be a good trick
> (and it was doing
> better than ln()). However, I wonder how I could convert
> the estimates
> back to the natural units. For example, I've seen the
> smearing technique
> for converting regression estimates scaled in ln(dollars)
> back to dollars,
> but I can't imagine what's involved in getting back from
> whatever it is
> lnskew0 creates. Then I thought, somebody on STATAlist might
> know. [Couldn't find anything in the list archives or manual.]
Without very much context, it is difficult to advise, but there
are issues here on various levels.
1. On a purely algebraic level, the back transformation corresponding
to t = ln(y - k) is exp(t) + k. Note that the constant k is left
behind by -lnskew0- as r(gamma).
2. Removal of bias produced by transformation is not quite so
straightforward. In the case of smearing, for example, Duan's
original paper makes clear that the smearing idea leads to a
very simple recipe for simple log transformation but typically
a messy recipe for other transformations. See
Duan, N. 1983. Smearing estimate: a nonparametric retransformation
method. Journal, American Statistical Association 78: 605-610.
I'm aware of two Stata programs for smearing, Richard Goldstein's
-predlog- (STB-29) and my own -smear- (unpublished), but both
concentrate entirely on log transformation (and to that
extent the name -smear- of mine is a misnomer). To do smearing
as an antidote to ln(y - k), you would need to write your own
code, I believe.
3. ln(y - k) will be less skew than ln(y) in almost all cases
but I wish you luck in finding a systematic,
scientific interpretation of k. Whenever, as here, there is concern
for getting predictions in the original metric, generalised
linear models offer, in my view, a far superior approach.
I've been tinkering with log-transforming an outcome variable for a
regression, but I thought lnskew0 might be a good trick (and it was doing
better than ln()). However, I wonder how I could convert the estimates
back to the natural units. For example, I've seen the smearing technique
for converting regression estimates scaled in ln(dollars) back to dollars,
but I can't imagine what's involved in getting back from whatever it is
lnskew0 creates. Then I thought, somebody on STATAlist might
know. [Couldn't find anything in the list archives or manual.]
Robert C. Saunders, M.P.P.
Box 90 GPC, Vanderbilt University
Willard G. Manning
Dept. of Health Studies -- MC 2007
The University of Chicago
5841 So. Maryland Ave.
Chicago, IL 60637
Phone: 773-834-1971 (direct)
* For searches and help try: