Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: visual guide to variable transformations?

From   Nick Cox <>
Subject   Re: st: visual guide to variable transformations?
Date   Fri, 8 Jun 2012 00:55:11 +0100

I agree with Austin and David. The business of why, when and how to
transform is rather too complicated to reduce easily to a very concise
statement. Nevertheless I wrote a Stata-linked guide to
transformations that is downloadable as a help file. It can be found
on SSC at -transint-.

David is too modest to underline that some of the best expository
material on transformations is still that to be found in a book he
co-edited with Frederick Mosteller and John Tukey:,descCd-tableOfContents.html

not to mention the evergreen

Hoaglin, D.C. 1988. Transformations in everyday experience. Chance
1(4): 40--45.


Austin Nichols

> Why not? Such advice would be generically incorrect.
> You are assuming only a bivariate relationship among continuous variables,
> but even in such a restricted setting, linearity and normality are far
> from required, and it is unclear how you would discern from
> most scatterplots how to get there even if they were indicated.
> E.g.
> clear all
> set seed 1
> drawnorm z e, n(1000)
> g x=normal(z)
> g y=x*exp(e)
> lpoly y x
> g y2=x+rnormal(exp(e),x)
> lpoly y2 x
> That said, a review of available -glm- links and
> common -nl- specifications might make a good FAQ.

David Hoaglin

Quite a lot has been written about transformations, including their
role in regression modeling.  I'll have to look for material that
approaches "a visual guide."

For now, I would like to correct the misimpression that, after
transformation, the data on an independent variable should resemble a
normal distribution.  I would not transform an independent variable
for that reason.

In the context of a regression model, the main aim in transforming an
independent variable is to promote linearity of the relation between
the dependent variable and the independent variable (as you describe
for Figure 1d).  Promoting linearity is also an important aim in
transforming the dependent variable.  Also, if the model involves more
than one independent variable, transforming the dependent variable may
make the contributions of the independent variables more nearly
additive (i.e., reduce or remove interactions among the independent

Another reason for transforming the dependent variable is to make
residual variability more nearly constant across the range of that
variable.  One usually checks on this by making various plots of

Choosing transformations often requires thought.  It should not be
reduced to a simple rule.  The transformations need to make sense in
the context of the data.

Lloyd Dumont

>> Does anyone know of a visual guide to variable transformations?  I have seen many decent verbal exlanations of whether, when, and specifically how to transform variables.  But, is there a single resource that shows which transformation is appropriate when.  For example, something like...
>> When an indep variable is distributed as it is in Figure 1a and is related to the dep var as shown here in Figure 1b, then you should use the _____ transformation.  Then, the transformed indep variable will be displayed as in Figure 1c (which I imagine will almost always be something like a normal distribution) and the relationship between the transformed variable and the dep var will be as displayed in Figure 1d (which I imagine will almost always be linear).
>> Of course, it all gets a little more complicated if we start talking about transforming the dep var, though this sort of transformation could also easily be displayed and explained visually.
>> Does anyone know of such a resource?  If not, why not?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index