Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -transint- updated on SSC


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: -transint- updated on SSC
Date   Wed, 23 Nov 2005 00:52:20 -0000

Thanks to Kit Baum, a revised version of -transint- 
is now available from SSC. 

-transint- is just a Stata help file containing some material
I wrote on transformations. Thus if you install -transint-, 
-help transint- or -whelp transint- will give you a view of 
this material. 

This arose out of various teaching and advising within my 
Department. As I am a geographer preparing material for geographers, 
the material should be considered in that light. I have not 
found a treatment of transformations that meets my idea of what
I wanted to be available, so I wrote one. My students and 
colleagues are not statisticians, any more than I am, but the 
colleagues concerned are strong quantitatively-minded scientists. 

Naturally the hope is that this material will be interesting or 
useful to others. I would appreciate comments, including information 
on errors or omissions. 

The previous version was published in 1999 and is re-released as 
transint6.hlp with some small fixes. The new version is much longer. 
You will be able to read -transint- in versions of Stata before 8, 
but much or all of the SMCL mark-up will be visible to you. 

The stimulus for re-releasing this was discovering a vein of ideas on 
how to transform variables that are both negative and positive, but 
heavy-tailed. This extract from transint.hlp may give some of the
flavour. It also indicates a topic on which I welcome references
and comments from Statalist members. 

============================= extract 

Transformations for variables that are both positive and negative

    Most of the literature on transformations focuses on one or both of two
    related situations: the variable concerned is strictly positive; or it is
    zero or positive. If the first situation does not hold, some transformations
    do not yield real number results (notably, logarithms and reciprocals); if
    the second situation does not hold, then some other transformations do not
    yield real number results or more generally do not appear useful (notably,
    cube roots, square roots or squares).

    However, in some situations response variables in particular can be both
    positive and negative. This is common whenever the response is a balance,
    change, difference or derivative. Although such variables are often skew, the
    most awkward property that may invite transformation is heavy (long or fat)
    tails, high kurtosis in one terminology.  Zero usually has a strong
    substantive meaning, so that we wish to preserve the distinction between
    negative, zero and positive values. (Note that Celsius or Fahrenheit
    temperatures do not really qualify here, as their zero points are
    statistically arbitrary, for all the importance of whether water melts or
    freezes.)

    In these circumstances, experience with right-skewed and strictly positive
    variables might suggest looking for a transformation that behaves like ln x
    when x is positive and like ln(-x) when x is negative.  This still leaves the
    problem of what to do with zeros. In addition, it is clear from any sketch
    that (in Stata terms)

        cond(x <= 0, ln(-x), ln(x))

    would be useless. One way forward is to use

        ln(-x + 1)    if x <= 0, 
        ln(x + 1)     if x > 0.  

    This can also be written

        sign(x) ln(|x| + 1)

    where sign(x) is 1 if x > 0, 0 if x == 0 and -1 if x < 0.  This function
    passes through the origin, behaves like x for small x, positive and negative,
    and like sign(x) ln(abs(x)) for large |x|.  The gradient is steepest at 1 at
    x = 0, so the transformation pulls in extreme values relative to those near
    the origin.  It has recently been dubbed the neglog transformation (Whittaker
    et al. 2005).  An earlier reference is John and Draper (1980).  In Stata
    language, this could be

        cond(x <= 0, ln(-x + 1), ln(x + 1))

    or

        sign(x) * ln(abs(x) + 1) 

    A suitable generalisation of powers other than 0 is

        -[(-x + 1)^p - 1] / p    if x <= 0, 
          [(x + 1)^p - 1] / p    if x > 0. 

    Transformations that affect skewness as well as heavy tails in variables that
    are both positive and negative were discussed by Yeo and Johnson (2000).

    Another possibility in this terrain is to apply the inverse hyperbolic
    function arsinh (also known as arg sinh and arcsinh). This is the inverse of
    the sinh function, which in turn is defined as

        sinh(x) = (exp(x) - exp(-x)) / 2. 

    The arsinh function can be computed in Stata as

        ln(x + sqrt(x^2 + 1)) 

    It too passes through the origin and is steepest at the origin.  For large
    |x| it behaves like sign(x) ln(|2x|). So in practice neglog(x) and arsinh(x)
    have loosely similar effects.

    John, J.A. and N.R. Draper. 1980.  An alternative family of transformations.
        Applied Statistics 29: 190-197.

    Whittaker, J., J. Whitehead and M. Somers. 2005.  The neglog transformation
        and quantile regression for the analysis of a large credit scoring
        database.  Applied Statistics 54: 863-878.

    Yeo, I. and R.A. Johnson. 2000.  A new family of power transformations to
        improve normality or symmetry.  Biometrika 87: 954-959.

============================= end of extract 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index