[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Correction for bias in regression estimates after log transformation |

Date |
Wed, 17 Dec 2008 13:06:19 -0000 |

The issue as I understand it for response y arises because the mean of log(y) differs from the log of mean(y). What you do to the predictors is immaterial. The problem is generic to any nonlinear transformation. I see there being two main relatively simple ways of tackling this problem. (There are other more complicated methods; my experience, such as it is, indicates that they don't give very different results except when results are highly dubious anyway.) 1. Avoid it altogether by using -glm- with appropriate link. 2. Use smearing. Richard Goldstein implemented -predlog- in 1996, which includes smearing. STB-29 sg48 . Predictions in the original metric for log-transformed models (help predlog if installed) . . . . . . . . . . . . . . . R. Goldstein 1/96 pp.27--29; STB Reprints Vol 5, pp.145--147 calculates three different retransformations, which allow obtaining predictions in the original metric Both the software and the original article are accessible to all. You can almost do smearing by hand, but here is a slightly more polished version of doing it by hand. *! NJC 2.1.0 8 January 2005 * NJC 1.0.0 13 September 2002 program smear, rclass version 8.0 syntax [if] [in] [, Generate(str) OUTofsample ] if "`generate'" != "" { capture confirm new variable `generate' if _rc { di as err "option syntax is generate(newvar)" exit _rc } } marksample touse qui count if `touse' if r(N) == 0 error 2000 tempvar resid yhatraw tempname rmse cf qui { * will exit with error message if no estimates scalar `rmse' = e(rmse) if "`outofsample'" != "" predict double `yhatraw' else predict double `yhatraw' if e(sample) predict double `resid', res replace `resid' = exp(`resid') su `resid', meanonly scalar `cf' = r(mean) if "`generate'" != "" { gen double `generate' = exp(`yhatraw') * `cf' if `touse' la var `generate' "smeared retransformation" } } di as res scalar(`cf') return scalar smearcf = `cf' end There is more discussion in N.J. Cox, J. Warburton, A. Armstrong and V.J. Holliday. 2008. Fitting concentration and load rating curves with generalised linear models. Earth Surface Processes and Landforms 33: 25-39 (doi: 10.1002/esp.1523) which may be accessible to you. Nick n.j.cox@durham.ac.uk Maarten buis --- "Loncar, Dejan" <LoncarD@unaids.org> wrote: > I have transformed the variables using log function before > regression. > > Do you know by any chance which function in Stata or some ado file > can perform antilog transformation after regression with correction > for bias in regression estimates? Bias means nothing else than that your estimates don't mean what you think they mean. So there are two ways of addressing bias: Either you change interpretation of the results so that the interpretation corresponds to the estimate, or you change your estimate so that it measures what you think it does. Another consequence of this is that there is no such thing as a biased estimate perse: you always need to specify what the estimate is a biased estimate of. Trivially all estimates are biased estimates of most concepts (e.g. the annual tea consumption of Burundi is a biased estimate of the number of ants per square inch in Amsterdam), and at the same time all estimates are unbiased estimates of the thing that they measure (but the thing they measure may not be of interest). The distinction between changing the interpretation and changing the estimate is nicely illustrated by looking at a log transformed dependent variable. If you fist transform the dependent variable and than perform a regular regression you can interpret the exponentiated coefficients as ratios of geometric means, but not as ratios of arithmatic means. You can get estimates in terms of ratios of arithmatic means when you use -glm- on the untransformed dependent variable with -link(log)- option. So if you are interested in the effect on the geometric mean, then -glm- will provide you with biased estimates. You can solve this either by changing your interpretation of the results to the effect in terms of the arithmatic mean or by estimating your model with -regress-. I have discussed a detailed example of this issue here: http://www.stata.com/statalist/archive/2008-11/msg00137.html Also see: Roger Newson (2003) Stata tip 1: The eform() option of regress. The Stata Journal 3(4): 445. http://stata-journal.com/article.html?article=st0054 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Correction for bias in regression estimates after log transformation***From:*Richard Goldstein <richgold@ix.netcom.com>

**References**:**st: Correction for bias in regression estimates after log transformation***From:*"Loncar, Dejan" <LoncarD@unaids.org>

**Re: st: Correction for bias in regression estimates after log transformation***From:*Maarten buis <maartenbuis@yahoo.co.uk>

- Prev by Date:
**st: RE: -ivprobit- warning: equation(s) Y not found** - Next by Date:
**Re: st: re: combined Correlationmatrix Pearson and Spearman + LaTeX output** - Previous by thread:
**RE: st: Correction for bias in regression estimates after log transformation** - Next by thread:
**Re: st: Correction for bias in regression estimates after log transformation** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |