Review of most common transformations
-------------------------------------

The most useful transformations in introductory data analysis are

the ^reciprocal^ x to 1/x and the ^negative reciprocal^ x to -1/x. This
is a very strong transformation with a drastic effect on distribution
shape. It can not be applied to zero values. Although it can be applied
to negative values, it is not useful unless all values are positive. The
reciprocal of a ratio may often be interpreted as easily as the ratio
itself: e.g.

population density (people per unit area) becomes area per
person

persons per doctor becomes doctors per person

rates of erosion become time to erode a unit depth.

(In practice, we might want to multiply or divide the results of taking
the reciprocal by some constant, such as 1000 or 10000, to get numbers
that are easy to manage, but that itself has no effect on skewness or
linearity.)

The reciprocal reverses order among values of the same sign: largest
becomes smallest, etc. The negative reciprocal preserves order among
values of the same sign.

the ^logarithm^ x to log base 10 of x OR
              x to log base e of x (ln x) OR
              x to log base 2 of x.
This is a strong transformation with a major effect on distribution
shape. It is commonly used for reducing right skewness and is often
appropriate for measured variables. It can not be applied to zero or
negative values. One unit on a logarithmic scale means a multiplication
by the base of logarithms being used. Exponential growth or decline

                        bx
                  y = ae

is made linear by

               ln y = ln a + bx

so that the response variable y should be logged. (Here e is a special
number, approximately 2.71828, that is the base of natural logarithms.)

An aside on this ^exponential growth or decline^ equation: put x = 0,
and

                        0
                 y  = ae  = a,

so a is the amount or count when x = 0. If a and b > 0, then y grows at
a faster and faster rate (e.g. compound interest or unchecked population
growth), whereas if a > 0 and b < 0, y declines at a slower and slower
rate (e.g. radioactive decay).

Power functions

                        b
                  y = ax

are made linear by

              log y = log a + b log x

so that both variables y and x should be logged.

An aside on such ^power functions^: put x = 0, and for b > 0,

                        b
                 y  = ax  = 0,

so the power function for positive b goes through the origin, which
often makes physical or biological or economic sense. Think: does zero
for x imply zero for y? This kind of power function is a shape
that fits many data sets rather well.

Consider ratios

                 y  = p / q

where p and q are both positive in practice. Examples are

                 males / females

                 dependants / workers

                 downstream length / downvalley length

then y is somewhere between 0 and infinity, or in the last case, between
1 and infinity. If p = q, then y = 1. Such definitions often lead to
skewed data, because there is a definite lower limit and no definite
upper limit. The logarithm, however, namely

              log y = log p / q = log p - log q,

is somewhere between -infinity and infinity and p = q means that log y =
0. Hence the logarithm of such a ratio is likely to be more
symmetrically distributed.

the ^cube root^ x to x^^(1/3). This is a fairly strong transformation
with a substantial effect on distribution shape: it is weaker than the
logarithm. It is also used for reducing right skewness, and has the
advantage that it can be applied to zero values. Note that the cube root
of a volume has the units of a length. It is commonly applied to
rainfall data.

the ^square root^ x to x^^(1/2) = sqrt(x). This is a transformation with
a moderate effect on distribution shape: it is weaker than the logarithm
and the cube root. It is also used for reducing right skewness, and also
has the advantage that it can be applied to zero values. Note that the
square root of an area has the units of a length. It is commonly applied
to counted data, especially if the values are mostly rather small.

the ^square^ x to x^^2. This transformation has a moderate effect on
distribution shape and it could be used to reduce left skewness. In
practice, the main reason for using it is to fit a response by a
quadratic function y = a + b x + c x^^2. Quadratics have a turning
point, either a maximum or a minimum, although the turning point in a
function fitted to data might be far beyond the limits of the
observations. The distance of a body from an origin is a quadratic if
that body is moving under constant acceleration, which gives a very
clear physical justification for using a quadratic. Otherwise quadratics
are typically used solely because they can mimic a relationship within
the data region. Outside that region they may behave very poorly,
because they take on arbitarily large values for extreme values of x,
and unless the intercept a is constrained to be 0, they may behave
unrealistically close to the origin.

Squaring usually makes sense only if the variable concerned is zero or
positive, given that (-x)^^2 and x^^2 are identical.

The main criterion in choosing a transformation is: what works with the
data? As examples above indicate, it is important to consider as well
two questions.

- What makes physical (biological, economic, whatever) sense, for
  example in terms of limiting behaviour as values get very small or
  very large? This question often leads to the use of logarithms.

- Can we keep dimensions and units simple and convenient? If possible,
  we prefer measurement scales that are easy to think about. The cube
  root of a volume and the square root of an area both have the
  dimensions of length, so far from complicating matters, such
  transformations may simplify them. Reciprocals usually have simple
  units, as mentioned earlier. Often, however, somewhat complicated
  units are a sacrifice that has to be made.

Also see:
---------

Reasons for using transformations                         help @trreason@

Transformations for proportions and percents              help @trpropor@

Psychological comments -- for the puzzled                 help @trpsych@

How to do transformations in Stata                        help @trstata@