Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: lnskew question


From   Willard Manning <w-manning@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: lnskew question
Date   Mon, 09 Dec 2002 14:17:49 -0600

Robert, Nick

Setting aside the k term, the retransformation problem is actually more complex than the issue of whether to use a parametric or less parametric retransformation factor. If log(y) = xb + e, with E(x'e) = 0, then the expectation of y given x is

E(y|x) = [exp(xb)][E(exp(e))].

The last term is consistently estimated by the smearing coefficient if the error e is homoscedastic in x. But if the error is heteroscedastic in some of the covariates, then one needs to introduce the additional terms to reflect the heteroscedasticity.

The easiest case to see is where e is normally distributed with log-scale variance v(x). Then we get,

E(y|x) = exp(xb + 0.5 v(x))

If the error is not normal but is heteroscedastic by a categorical variable, then smearing by subgroup will do.

If the results are heteroscedastic in a continuous variable or in several variables, then one alternative is to use a GLM model with a log link -- possibly the gamma. As long as the specification of the link and the covariates is correct, the GLM will be unbiased. If the family(for example, the gamma) is incorrect, then the losses will be those of precision. The specific family for the GLM depends on the relationship of the mean and variance functions for the application.

Note that the GLM predictions can be retransformed by exponentiation because the GLM is estimating the log(E(y|x)), not the E(log(y|x)) as least squares on log y is.

For some discussion of this, see Blough et al in the Journal of Health Economics (April 1999) for a case with the additional wrinkle of a large zero mass. If there is no zero mass problem, Robert could consider the second part of Blough's model.

Will


At 09:56 AM 12/9/2002 +0000, you wrote:

Robert Saunders
>
> I've been tinkering with log-transforming an outcome variable for a
> regression, but I thought lnskew0 might be a good trick
> (and it was doing
> better than ln()).  However, I wonder how I could convert
> the estimates
> back to the natural units.  For example, I've seen the
> smearing technique
> for converting regression estimates scaled in ln(dollars)
> back to dollars,
> but I can't imagine what's involved in getting back from
> whatever it is
> lnskew0 creates.  Then I thought, somebody on STATAlist might
> know.  [Couldn't find anything in the list archives or manual.]

Without very much context, it is difficult to advise, but there
are issues here on various levels.

1. On a purely algebraic level, the back transformation corresponding
to t = ln(y - k) is exp(t) + k. Note that the constant k is left
behind by -lnskew0- as r(gamma).

2. Removal of bias produced by transformation is not quite so
straightforward. In the case of smearing, for example, Duan's
original paper makes clear that the smearing idea leads to a
very simple recipe for simple log transformation but typically
a messy recipe for other transformations. See

Duan, N. 1983. Smearing estimate: a nonparametric retransformation
method.  Journal, American Statistical Association 78: 605-610.

I'm aware of two Stata programs for smearing, Richard Goldstein's
-predlog- (STB-29) and my own -smear- (unpublished), but both
concentrate entirely on log transformation (and to that
extent the name -smear- of mine is a misnomer). To do smearing
as an antidote to ln(y - k), you would need to write your own
code, I believe.

3. ln(y - k) will be less skew than ln(y) in almost all cases
but I wish you luck in finding a systematic,
scientific interpretation of k. Whenever, as here, there is concern
for getting predictions in the original metric, generalised
linear models offer, in my view, a far superior approach.

Nick
n.j.cox@durham.ac.uk

_____________________________________________________________

Hi,
I've been tinkering with log-transforming an outcome variable for a
regression, but I thought lnskew0 might be a good trick (and it was doing
better than ln()). However, I wonder how I could convert the estimates
back to the natural units. For example, I've seen the smearing technique
for converting regression estimates scaled in ln(dollars) back to dollars,
but I can't imagine what's involved in getting back from whatever it is
lnskew0 creates. Then I thought, somebody on STATAlist might
know. [Couldn't find anything in the list archives or manual.]
Thanks,
r

################################
Robert C. Saunders, M.P.P.
Box 90 GPC, Vanderbilt University
Nashville, Tennessee
___________________________________________________________
Willard G. Manning
Professor
Dept. of Health Studies -- MC 2007
The University of Chicago
5841 So. Maryland Ave.
Chicago, IL  60637

Phone:  773-834-1971 (direct)
        773-702-2453 (department)

Fax:    773-702-1979

E-mail: w-manning@uchicago.edu


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index