[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Willard Manning <w-manning@uchicago.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: lnskew question |

Date |
Mon, 09 Dec 2002 14:17:49 -0600 |

Robert, Nick

Setting aside the k term, the retransformation problem is actually more complex than the issue of whether to use a parametric or less parametric retransformation factor. If log(y) = xb + e, with E(x'e) = 0, then the expectation of y given x is

E(y|x) = [exp(xb)][E(exp(e))].

The last term is consistently estimated by the smearing coefficient if the error e is homoscedastic in x. But if the error is heteroscedastic in some of the covariates, then one needs to introduce the additional terms to reflect the heteroscedasticity.

The easiest case to see is where e is normally distributed with log-scale variance v(x). Then we get,

E(y|x) = exp(xb + 0.5 v(x))

If the error is not normal but is heteroscedastic by a categorical variable, then smearing by subgroup will do.

If the results are heteroscedastic in a continuous variable or in several variables, then one alternative is to use a GLM model with a log link -- possibly the gamma. As long as the specification of the link and the covariates is correct, the GLM will be unbiased. If the family(for example, the gamma) is incorrect, then the losses will be those of precision. The specific family for the GLM depends on the relationship of the mean and variance functions for the application.

Note that the GLM predictions can be retransformed by exponentiation because the GLM is estimating the log(E(y|x)), not the E(log(y|x)) as least squares on log y is.

For some discussion of this, see Blough et al in the Journal of Health Economics (April 1999) for a case with the additional wrinkle of a large zero mass. If there is no zero mass problem, Robert could consider the second part of Blough's model.

Will

At 09:56 AM 12/9/2002 +0000, you wrote:

Robert Saunders > > I've been tinkering with log-transforming an outcome variable for a > regression, but I thought lnskew0 might be a good trick > (and it was doing > better than ln()). However, I wonder how I could convert > the estimates > back to the natural units. For example, I've seen the > smearing technique > for converting regression estimates scaled in ln(dollars) > back to dollars, > but I can't imagine what's involved in getting back from > whatever it is > lnskew0 creates. Then I thought, somebody on STATAlist might > know. [Couldn't find anything in the list archives or manual.] Without very much context, it is difficult to advise, but there are issues here on various levels. 1. On a purely algebraic level, the back transformation corresponding to t = ln(y - k) is exp(t) + k. Note that the constant k is left behind by -lnskew0- as r(gamma). 2. Removal of bias produced by transformation is not quite so straightforward. In the case of smearing, for example, Duan's original paper makes clear that the smearing idea leads to a very simple recipe for simple log transformation but typically a messy recipe for other transformations. See Duan, N. 1983. Smearing estimate: a nonparametric retransformation method. Journal, American Statistical Association 78: 605-610. I'm aware of two Stata programs for smearing, Richard Goldstein's -predlog- (STB-29) and my own -smear- (unpublished), but both concentrate entirely on log transformation (and to that extent the name -smear- of mine is a misnomer). To do smearing as an antidote to ln(y - k), you would need to write your own code, I believe. 3. ln(y - k) will be less skew than ln(y) in almost all cases but I wish you luck in finding a systematic, scientific interpretation of k. Whenever, as here, there is concern for getting predictions in the original metric, generalised linear models offer, in my view, a far superior approach. Nick n.j.cox@durham.ac.uk

_____________________________________________________________ Hi, I've been tinkering with log-transforming an outcome variable for a regression, but I thought lnskew0 might be a good trick (and it was doing better than ln()). However, I wonder how I could convert the estimates back to the natural units. For example, I've seen the smearing technique for converting regression estimates scaled in ln(dollars) back to dollars, but I can't imagine what's involved in getting back from whatever it is lnskew0 creates. Then I thought, somebody on STATAlist might know. [Couldn't find anything in the list archives or manual.] Thanks, r ################################ Robert C. Saunders, M.P.P. Box 90 GPC, Vanderbilt University Nashville, Tennessee ___________________________________________________________ Willard G. Manning Professor Dept. of Health Studies -- MC 2007 The University of Chicago 5841 So. Maryland Ave. Chicago, IL 60637 Phone: 773-834-1971 (direct) 773-702-2453 (department) Fax: 773-702-1979 E-mail: w-manning@uchicago.edu * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: lnskew question***From:*Robert Saunders <robert.c.saunders@vanderbilt.edu>

**st: RE: lnskew question***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: RE: McDonald and Moffitt decomposition** - Next by Date:
**Re: st: Stopping rules for multiple looks at data** - Previous by thread:
**st: RE: lnskew question** - Next by thread:
**st: lnskew question** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |