Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: retransformation of ln(Y) coefficient and CI in regression

From	Roger Newson <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: retransformation of ln(Y) coefficient and CI in regression
Date	Mon, 6 Jun 2011 10:31:46 +0100

The -regress- command has an -eform- option, which gives the confidencelimits of geometric means and their ratios. This is described in Newson(2003), and can be used together with -robust- to displayunequal-variance confidence limits.

And, if you want to plot the confidence limits against the factorvalues, then you might like to use the -parmest-, -eclplot-, -fvregen-and -descsave- packages, downloadable from SSC. As in:


tempfile df0
descsave factor, do(`"`df0'"', replace)
regress lnY ibn.factor, vce(robust) noconst eform(GM/Ratio)
parmest, norestore eform
fvregen, do(`"`df0'"')
eclplot estimate min* max* factor

In this example, we start by defining a temporary file whose macro nameis -df0-. We then use -descsave- (an extended version of -describe-which can create output do-files) to write a do-file to that temporaryfile, defining the variable attributes (storage type, format, variablelabel and value label) of the variable -factor-. We then use -regress-,with the -eform(GM)- option to specify confidence limits for geometricmeans and/or their ratios, and the -noconst- option and the X-variablelist -ibn.factor- to specify that the parameters will be geometric meansinstead of ratios. We then use -parmest- to overwrite the existingdataset in memory with an output dataset (or resultsset), with 1observation per parameter and data on parameter names, estimates,confidence limits and other parameter attributes. In this new outputdataset, we then use -fvregen- to regenerate the variable -factor- fromthe parameter names. Finally, we use -eclplot- to produce a confidenceinterval plot, with the values of -factor- on the X-axis and theestimates and unequal-variance confidence limits for the correspondinggeometric means on the Y-axis. More about all these packages can befound in the on-line help for -parmest-, which contains many hypertextreferences.


I hope this helps.

Best wishes

Roger


References

Newson R. Stata tip 1: The eform() option of regress. The Stata Journal2003; 3(4): 445. Download from

http://www.stata-journal.com/article.html?article=st0054

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.

On 05/06/2011 16:26, Steve Rothenberg wrote:

I have a simple model with a natural log dependent variable and a three
level factor predictor.    I’ve used

  . regress lnY i.factor, vce(robust)

to obtain estimates in the natural log metric.  I want to be able to display
the results in a graph as means and 95% CI for each level of the factor with
retransformed units in the original Y metric.

I’ve also calculated geometric means and 95% CI for each level of the factor
variable using

. ameans Y if factor==x

simply as a check, though the 95% CI is not adjusted for the vce(robust)
standard error as calculated by the -regress- model.

Using naïve transformation (i.e. ignoring retransformation bias) with

. display exp(coefficient)

from the output of -regress- for each level of the predictor, with the
classic formulation:

Level 0 = exp(constant)
Level 1 = exp(constant+coef(1))
Level 2 = exp(constant+coef(2))

the series of retransformations from the -regress- command is the same as
the geometric means from the series of -ameans- commands.

When I try to do the same with the lower and upper 95% CI (substituting the
limits of the 95% CI for the coefficients) from the -regress- command,
however, the retransformed IC is much larger than calculated from the-
ameans- command, much more so than the differences in standard errors from
regress with and without the vce(robust) option would indicate.

I’ve discovered -levpredict- for unbiased retransformation of log dependent
variables in regression-type estimations by Christopher Baum in SSC but it
only outputs the bias-corrected means from the preceding -regress-.  To be
sure there is some small bias in the first or second decimal place of the
mean factor levels compared to naïve retransformation.

Am I doing something wrong by treating the 95% CI of each level of the
factor variable in the same way I treat the coefficients without correcting
for retransformation bias?  Is there any way I can obtain either the
retransformed CI or the bias-corrected retransformed CI for the different
levels of the factor variable in the original metric of Y?

I'd like to retain the robust SE from the above estimation as there is
considerable difference in variance in each level of the factor variable.

Steve Rothenberg
National Institute of Public Health
Cuernavaca, Morelos, Mexico

Stata/MP 11.2 for Windows (32-bit)
Born 30 Mar 2011


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: retransformation of ln(Y) coefficient and CI in regression
  - From: "Steve Rothenberg" <[email protected]>

Prev by Date: Re: st: IV command in system GMM
Next by Date: st: Sala-i-Martin's Extreme Bound Analaysis
Previous by thread: st: RE: retransformation of ln(Y) coefficient and CI in regression
Next by thread: st: How to interprete Wooldridge test for autocorrelation in panel data
Index(es):
- Date
- Thread