Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: correcting skewness of an indep variables

From   Maarten buis <>
To   stata list <>
Subject   RE: st: correcting skewness of an indep variables
Date   Mon, 15 Mar 2010 16:27:39 +0000 (GMT)

--- Fabio Zona asked:
> When does one need to correct the skewness of an independent 
> variable? I have a logit regression and my indep variable is
> strongly skewed; do I need to correct this (by using lnskew0 )??

Never. The only thing you need to take care of is that you 
think that a linear relationship between your dependent and
independent variable is a reasonable summary of that effect.
A very skewed independent variables is sometimes a sign that
the effect of that variable might be non-linear. Consider
the following dataset:

*------------- begin example ---------------
use "";, clear
spikeplot articles
*------------- end example ------------------

Do we think that moving from 0 to 1 published articles has 
the same on someones academic career as moving from 60 to 
61 articles. I don't believe so, there are probably 
"decreasing returns to publications". So here I would 
probably log transform articles, so that a percentage
increase in the number of published articles has a 
constant effect. The skew here gave a hint (actualy, the
range of that variable was the first thing that triggered
my suspicion about this variable), but the argument I used
to justify the transformation has to do with the relationship
between the dependent and independent variable.

Another reason for skewness is the presence of a spike ---
that is, a single value that is very common. In that case
you could consider adding the variable linearly + a dummy
indicating whether or not an observations belongs to the 
spike group or not. We would do that, if think that that
value is in some sense special (this is often the case 
when that spike value is 0). Say we have data on the 
proportion of the women's income in the total family income.
In more traditional countries like Germany we might expect a 
spike at zero. In this case adding the proportion + dummy
could make sense.

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index