Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: retransformation of ln(Y) coefficient and CI in regression

From   Roger Newson <>
To   "" <>
Subject   Re: st: retransformation of ln(Y) coefficient and CI in regression
Date   Mon, 6 Jun 2011 10:31:46 +0100

The -regress- command has an -eform- option, which gives the confidence limits of geometric means and their ratios. This is described in Newson (2003), and can be used together with -robust- to display unequal-variance confidence limits.

And, if you want to plot the confidence limits against the factor values, then you might like to use the -parmest-, -eclplot-, -fvregen- and -descsave- packages, downloadable from SSC. As in:

tempfile df0
descsave factor, do(`"`df0'"', replace)
regress lnY ibn.factor, vce(robust) noconst eform(GM/Ratio)
parmest, norestore eform
fvregen, do(`"`df0'"')
eclplot estimate min* max* factor

In this example, we start by defining a temporary file whose macro name is -df0-. We then use -descsave- (an extended version of -describe- which can create output do-files) to write a do-file to that temporary file, defining the variable attributes (storage type, format, variable label and value label) of the variable -factor-. We then use -regress-, with the -eform(GM)- option to specify confidence limits for geometric means and/or their ratios, and the -noconst- option and the X-variable list -ibn.factor- to specify that the parameters will be geometric means instead of ratios. We then use -parmest- to overwrite the existing dataset in memory with an output dataset (or resultsset), with 1 observation per parameter and data on parameter names, estimates, confidence limits and other parameter attributes. In this new output dataset, we then use -fvregen- to regenerate the variable -factor- from the parameter names. Finally, we use -eclplot- to produce a confidence interval plot, with the values of -factor- on the X-axis and the estimates and unequal-variance confidence limits for the corresponding geometric means on the Y-axis. More about all these packages can be found in the on-line help for -parmest-, which contains many hypertext references.

I hope this helps.

Best wishes



Newson R. Stata tip 1: The eform() option of regress. The Stata Journal 2003; 3(4): 445. Download from

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Web page:
Departmental Web page:

Opinions expressed are those of the author, not of the institution.

On 05/06/2011 16:26, Steve Rothenberg wrote:
I have a simple model with a natural log dependent variable and a three
level factor predictor.    I’ve used

  . regress lnY i.factor, vce(robust)

to obtain estimates in the natural log metric.  I want to be able to display
the results in a graph as means and 95% CI for each level of the factor with
retransformed units in the original Y metric.

I’ve also calculated geometric means and 95% CI for each level of the factor
variable using

. ameans Y if factor==x

simply as a check, though the 95% CI is not adjusted for the vce(robust)
standard error as calculated by the -regress- model.

Using naïve transformation (i.e. ignoring retransformation bias) with

. display exp(coefficient)

from the output of -regress- for each level of the predictor, with the
classic formulation:

Level 0 = exp(constant)
Level 1 = exp(constant+coef(1))
Level 2 = exp(constant+coef(2))

the series of retransformations from the -regress- command is the same as
the geometric means from the series of -ameans- commands.

When I try to do the same with the lower and upper 95% CI (substituting the
limits of the 95% CI for the coefficients) from the -regress- command,
however, the retransformed IC is much larger than calculated from the-
ameans- command, much more so than the differences in standard errors from
regress with and without the vce(robust) option would indicate.

I’ve discovered -levpredict- for unbiased retransformation of log dependent
variables in regression-type estimations by Christopher Baum in SSC but it
only outputs the bias-corrected means from the preceding -regress-.  To be
sure there is some small bias in the first or second decimal place of the
mean factor levels compared to naïve retransformation.

Am I doing something wrong by treating the 95% CI of each level of the
factor variable in the same way I treat the coefficients without correcting
for retransformation bias?  Is there any way I can obtain either the
retransformed CI or the bias-corrected retransformed CI for the different
levels of the factor variable in the original metric of Y?

I'd like to retain the robust SE from the above estimation as there is
considerable difference in variance in each level of the factor variable.

Steve Rothenberg
National Institute of Public Health
Cuernavaca, Morelos, Mexico

Stata/MP 11.2 for Windows (32-bit)
Born 30 Mar 2011

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index