Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: visual guide to variable transformations?

From   Lloyd Dumont <>
To   "" <>
Subject   Re: st: visual guide to variable transformations?
Date   Tue, 12 Jun 2012 10:25:04 -0700 (PDT)

Thank you, Cameron, Nick, Austin, and David for these suggestions.  Definitely enough to get me on my way.  Lloyd

----- Original Message -----
From: Cameron McIntosh <>
Sent: Thursday, June 7, 2012 9:52 PM
Subject: RE: st: visual guide to variable transformations?


Not exactly what you're searching for either, but if you're thinking about transforming variables in the regression context, I might also recommend taking a look at:

Ip, W.C., Wong, H., Wang, S.-G., & Jia, Z.-Z. (2004). A GIC rule for assessing data transformation in regression. Statistics & Probability Letters, 68(1), 105–110.

Cheng, T.-C. (2005). Robust regression diagnostics with data transformations. Computational Statistics & Data Analysis, 49(3), 875–891.

da Silva, M.V., Van Tassell, C.P., Sonstegard, T.S., Cobuci, J.A., & Gasbarre, L.C. (2012). Box–Cox Transformation and Random Regression Models for Fecal egg Count Data. Frontiers in Genetics, 2: 112.

Riani, M., & Atkinson, A.C. (2000). Robust Diagnostic Data Analysis: Transformations in Regression. Technometrics, 42(4), 384-394.

Dastan, A., & Horne, R.N. (2011). Robust Well-Test Interpretation by Using Nonlinear Regression With Parameter and Data Transformations. SPE Journal, 16(3), 698-712.

Stöckl, D., & Thienpont, L.M. (2008). Introduction of non-linearity by data transformation in method comparison and commutability studies. Clinical Chemistry and Laboratory Medicine, 46(12), 1784-1785.

Zhou, X.-H., Lin, H., & Johnson, E. (2008). Non-parametric heteroscedastic transformation regression models for skewed data with an application to health care costs. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 70(5), 1029–1047.


> Date: Fri, 8 Jun 2012 00:55:11 +0100
> Subject: Re: st: visual guide to variable transformations?
> From:
> To:
> I agree with Austin and David. The business of why, when and how to
> transform is rather too complicated to reduce easily to a very concise
> statement. Nevertheless I wrote a Stata-linked guide to
> transformations that is downloadable as a help file. It can be found
> on SSC at -transint-.
> David is too modest to underline that some of the best expository
> material on transformations is still that to be found in a book he
> co-edited with Frederick Mosteller and John Tukey:
> not to mention the evergreen
> Hoaglin, D.C. 1988. Transformations in everyday experience. Chance
> 1(4): 40--45.
> Nick
> Austin Nichols
> > Why not? Such advice would be generically incorrect.
> > You are assuming only a bivariate relationship among continuous variables,
> > but even in such a restricted setting, linearity and normality are far
> > from required, and it is unclear how you would discern from
> > most scatterplots how to get there even if they were indicated.
> >
> > E.g.
> >
> > clear all
> > set seed 1
> > drawnorm z e, n(1000)
> > g x=normal(z)
> > g y=x*exp(e)
> > lpoly y x
> > g y2=x+rnormal(exp(e),x)
> > lpoly y2 x
> >
> > That said, a review of available -glm- links and
> > common -nl- specifications might make a good FAQ.
> David Hoaglin
> Quite a lot has been written about transformations, including their
> role in regression modeling.  I'll have to look for material that
> approaches "a visual guide."
> For now, I would like to correct the misimpression that, after
> transformation, the data on an independent variable should resemble a
> normal distribution.  I would not transform an independent variable
> for that reason.
> In the context of a regression model, the main aim in transforming an
> independent variable is to promote linearity of the relation between
> the dependent variable and the independent variable (as you describe
> for Figure 1d).  Promoting linearity is also an important aim in
> transforming the dependent variable.  Also, if the model involves more
> than one independent variable, transforming the dependent variable may
> make the contributions of the independent variables more nearly
> additive (i.e., reduce or remove interactions among the independent
> variables).
> Another reason for transforming the dependent variable is to make
> residual variability more nearly constant across the range of that
> variable.  One usually checks on this by making various plots of
> residuals.
> Choosing transformations often requires thought.  It should not be
> reduced to a simple rule.  The transformations need to make sense in
> the context of the data.
> Lloyd Dumont
> >> Does anyone know of a visual guide to variable transformations?  I have seen many decent verbal exlanations of whether, when, and specifically how to transform variables.  But, is there a single resource that shows which transformation is appropriate when.  For example, something like...
> >>
> >> When an indep variable is distributed as it is in Figure 1a and is related to the dep var as shown here in Figure 1b, then you should use the _____ transformation.  Then, the transformed indep variable will be displayed as in Figure 1c (which I imagine will almost always be something like a normal distribution) and the relationship between the transformed variable and the dep var will be as displayed in Figure 1d (which I imagine will almost always be linear).
> >>
> >> Of course, it all gets a little more complicated if we start talking about transforming the dep var, though this sort of transformation could also easily be displayed and explained visually.
> >>
> >> Does anyone know of such a resource?  If not, why not?
> >>
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index