Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: visual guide to variable transformations? |
Date | Fri, 8 Jun 2012 00:55:11 +0100 |
I agree with Austin and David. The business of why, when and how to transform is rather too complicated to reduce easily to a very concise statement. Nevertheless I wrote a Stata-linked guide to transformations that is downloadable as a help file. It can be found on SSC at -transint-. David is too modest to underline that some of the best expository material on transformations is still that to be found in a book he co-edited with Frederick Mosteller and John Tukey: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471384917,descCd-tableOfContents.html not to mention the evergreen Hoaglin, D.C. 1988. Transformations in everyday experience. Chance 1(4): 40--45. Nick Austin Nichols > Why not? Such advice would be generically incorrect. > You are assuming only a bivariate relationship among continuous variables, > but even in such a restricted setting, linearity and normality are far > from required, and it is unclear how you would discern from > most scatterplots how to get there even if they were indicated. > > E.g. > > clear all > set seed 1 > drawnorm z e, n(1000) > g x=normal(z) > g y=x*exp(e) > lpoly y x > g y2=x+rnormal(exp(e),x) > lpoly y2 x > > That said, a review of available -glm- links and > common -nl- specifications might make a good FAQ. David Hoaglin Quite a lot has been written about transformations, including their role in regression modeling. I'll have to look for material that approaches "a visual guide." For now, I would like to correct the misimpression that, after transformation, the data on an independent variable should resemble a normal distribution. I would not transform an independent variable for that reason. In the context of a regression model, the main aim in transforming an independent variable is to promote linearity of the relation between the dependent variable and the independent variable (as you describe for Figure 1d). Promoting linearity is also an important aim in transforming the dependent variable. Also, if the model involves more than one independent variable, transforming the dependent variable may make the contributions of the independent variables more nearly additive (i.e., reduce or remove interactions among the independent variables). Another reason for transforming the dependent variable is to make residual variability more nearly constant across the range of that variable. One usually checks on this by making various plots of residuals. Choosing transformations often requires thought. It should not be reduced to a simple rule. The transformations need to make sense in the context of the data. Lloyd Dumont >> Does anyone know of a visual guide to variable transformations? I have seen many decent verbal exlanations of whether, when, and specifically how to transform variables. But, is there a single resource that shows which transformation is appropriate when. For example, something like... >> >> When an indep variable is distributed as it is in Figure 1a and is related to the dep var as shown here in Figure 1b, then you should use the _____ transformation. Then, the transformed indep variable will be displayed as in Figure 1c (which I imagine will almost always be something like a normal distribution) and the relationship between the transformed variable and the dep var will be as displayed in Figure 1d (which I imagine will almost always be linear). >> >> Of course, it all gets a little more complicated if we start talking about transforming the dep var, though this sort of transformation could also easily be displayed and explained visually. >> >> Does anyone know of such a resource? If not, why not? >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/