Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: correcting skewness of an indep variables

From	David Hoaglin <[email protected]>
To	[email protected]
Subject	Re: st: correcting skewness of an indep variables
Date	Sun, 21 Jul 2013 09:24:12 -0400

Dimitrie,

The skewness of a predictor variable is not necessarily a problem, and
neither is a spike at 0.  The first step should be to examine whether
the relation between the dependent variable and each of the predictors
(in the full regression model) departs systematically from being
linear.  Various plots of residuals can help you do this.

If the data on the dependent variable when the predictor is 0 behave
differently from the data on the dependent variable when the predictor
is > 0, you may need to model the two parts separately (as a sort of
mixture).  You can try omitting all the observations in which the
predictor is 0 and fitting a separate regression to the remaining
data.

In a response on Cross Validated, you mentioned that your data came
from natural disasters over a span of years.  Should time be a
predictor in your models?

Without more information on your data, I can only offer general suggestions.

David Hoaglin

On Sun, Jul 21, 2013 at 7:31 AM, Mihes, Dimitrie
<[email protected]> wrote:
> Apologies, allow me to correct myself. The issue I've mentioned has also been addressed in
> http://stats.stackexchange.com/questions/64714/count-data-as-an-independent-variable-in-ols-using-a-dummy-variable-the-variab?noredirect=1#comment124994_64714
>
> However, the proposed solution seems to be in contrast with that proposed in this thread (which I had mistakenly not mentioned)
> http://www.stata.com/statalist/archive/2010-03/msg01034.html
>
> From my understanding, the former suggest using a dummy variable to account for a spike in 0 (for a predictor based on count data) only when zero means unobserved or truncated data, whereas the latter suggest either looking for a non-linear relationship between the variables (in which case, log transformation is proposed) or adding a dummy variable+ the skewed variable linearly even when the zeros represent the true value.
> I am conflicted between the two, as the former suggests that the dummy variable is useless when zeros are the observed values, while the latter, who advocates this techinque when 0 is the true value, lacks a more elaborate explanation with regards to the interpretation of the dummy alongside the linear variable and with regards to the process through which the dummy variable controls for the spike in 0.
>
> Moreover, using a log-transformation renders the 0 values as "missing values".
>
> Thanks for your consideration.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: correcting skewness of an indep variables
  - From: "Mihes, Dimitrie" <[email protected]>

References:
- RE: st: correcting skewness of an indep variables
  - From: "Mihes, Dimitrie" <[email protected]>
- Re: st: correcting skewness of an indep variables
  - From: Nick Cox <[email protected]>
- RE: st: correcting skewness of an indep variables
  - From: "Mihes, Dimitrie" <[email protected]>

Prev by Date: Re: st: Problems with mata:st_view and syntax
Next by Date: st: Computation of standard errors in an IV setting
Previous by thread: RE: st: correcting skewness of an indep variables
Next by thread: RE: st: correcting skewness of an indep variables
Index(es):
- Date
- Thread