Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: correcting skewness of an indep variables


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: correcting skewness of an indep variables
Date   Sun, 21 Jul 2013 11:20:35 +0100

This is in essence a cross-posting of

http://stats.stackexchange.com/questions/64714/count-data-as-an-independent-variable-in-ols-using-a-dummy-variable-the-variab

The Statalist FAQ is quite explicit about expected behaviour here: see
http://www.stata.com/support/faqs/resources/statalist-faq/#crossposting

"People posting on Statalist may also think about posting the same
question on other listservers or in web forums. There is absolutely no
rule against doing that; it is not our business to constrain what you
do elsewhere.

But if you do post elsewhere, we ask that you provide cross-references
in URL form to searchable archives. That way, people interested in
your question can quickly check what has been said elsewhere and avoid
posting similar comments. Being open about cross-posting saves
everyone time.

Cross-posting does not affect the request elsewhere in this FAQ that
you close threads on Statalist. If your question was answered well
elsewhere, you are asked to post a cross-reference to that in a
closure on Statalist."

In this case, this question has already received much comment on Cross
Validated, which appears ignored here. Also, the reference to "this
thread" is cryptic.

Nick
njcoxstata@gmail.com


On 21 July 2013 10:52, Mihes, Dimitrie <dimitrie.mihes.12@ucl.ac.uk> wrote:
> I am using OLS to model the relationship between amount of foreign aid (dependent variable, logged) and media coverage (number of newspaper articles, count variable). I assume a linear relationship between the two and use the "media coverage" variable as a continuous predictor. There is a spike in 0, however, in the count variable which makes it highly skewed to the right.
> Although this problem was addressed in this thread, I would like to better understand why using a dummy variable alongside the original variable would improve the model. What does the dummy variable do for the skewness? how can it be interpreted in parallel with the continuous part of the model? Moreover, do the values of 0 in the count variable have to be kept or replaced as "missing values" ?
> I am also using a second IV, measuring the number of negative articles, which naturally spikes at the value of 0 as well, but has more values of 0 than the "total amount of articles" variable. Does the dummy variable control for the zeros in this variable as well?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index