Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten buis <maartenbuis@yahoo.co.uk> |

To |
stata list <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: correcting skewness of an indep variables |

Date |
Mon, 15 Mar 2010 16:27:39 +0000 (GMT) |

--- Fabio Zona asked: > When does one need to correct the skewness of an independent > variable? I have a logit regression and my indep variable is > strongly skewed; do I need to correct this (by using lnskew0 )?? Never. The only thing you need to take care of is that you think that a linear relationship between your dependent and independent variable is a reasonable summary of that effect. A very skewed independent variables is sometimes a sign that the effect of that variable might be non-linear. Consider the following dataset: *------------- begin example --------------- use "http://www.indiana.edu/~jslsoc/stata/spex_data/tenure01.dta";, clear spikeplot articles *------------- end example ------------------ Do we think that moving from 0 to 1 published articles has the same on someones academic career as moving from 60 to 61 articles. I don't believe so, there are probably "decreasing returns to publications". So here I would probably log transform articles, so that a percentage increase in the number of published articles has a constant effect. The skew here gave a hint (actualy, the range of that variable was the first thing that triggered my suspicion about this variable), but the argument I used to justify the transformation has to do with the relationship between the dependent and independent variable. Another reason for skewness is the presence of a spike --- that is, a single value that is very common. In that case you could consider adding the variable linearly + a dummy indicating whether or not an observations belongs to the spike group or not. We would do that, if think that that value is in some sense special (this is often the case when that spike value is 0). Say we have data on the proportion of the women's income in the total family income. In more traditional countries like Germany we might expect a spike at zero. In this case adding the proportion + dummy could make sense. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

