Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Right skewed (positive) dependent variable


From   SURYADIPTA ROY <[email protected]>
To   [email protected]
Subject   Re: st: Right skewed (positive) dependent variable
Date   Thu, 10 Jun 2010 16:02:30 -0400

Maarten,
You are right- there is a huge amount of clustering of data very close
to 0 (but not equal to 0). For the different dependent variables, the
skewness range from 4.20 - 5.42. I believe that one motivation of my
using log(1+oldvar) transformation initially was to avoid any missing
value problem. Incidentally, for regressions with transformed dep.
variables after -ladder- , the rvfplots yielded very nice plots with
the residuals scattered nicely around the 0-line. I believe that is
what we are really after, and not the normality of the dependent
variable? Anyway, I am studying -glm- very carefully for
implementation.

Thanks to all of you for very helpful discussions and suggestions!
Suryadipta.

On Thu, Jun 10, 2010 at 11:51 AM, Maarten buis <[email protected]> wrote:
> --- On Thu, 10/6/10, SURYADIPTA ROY wrote:
>> However, as I look at my program now, I discover
>> the source of the anomaly- my transformatrion
>> was newvar=ln(1+oldvar).. that explains.
>
> Are there 0s in your dependent variable (oldvar)?
> If there are, then you really have no choice other
> than go the -glm- route. There are ways of getting
> a meaningfull interpretation out of a log transformed
> dependent variable, but no such way exists for the
> transformation log(oldvar + some constant), and
> leaving the constant out is no sollution either, as
> that means that he 0s will be recoded to missing
> values. This may also explain your non-normality:
> is there a spike at 0. If that is the case, than
> there can be no transformation that will lead to
> a normal distribution. In that case you could
> consider modeling the zero separately using -zip-.
> It is usually used for counts, but can also be
> used for continuous variables in a Quasi-likelihood
> kind of way, by specifying the -robust- option.
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index