Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
SURYADIPTA ROY <sroy9163@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Right skewed (positive) dependent variable |

Date |
Thu, 10 Jun 2010 16:02:30 -0400 |

Maarten, You are right- there is a huge amount of clustering of data very close to 0 (but not equal to 0). For the different dependent variables, the skewness range from 4.20 - 5.42. I believe that one motivation of my using log(1+oldvar) transformation initially was to avoid any missing value problem. Incidentally, for regressions with transformed dep. variables after -ladder- , the rvfplots yielded very nice plots with the residuals scattered nicely around the 0-line. I believe that is what we are really after, and not the normality of the dependent variable? Anyway, I am studying -glm- very carefully for implementation. Thanks to all of you for very helpful discussions and suggestions! Suryadipta. On Thu, Jun 10, 2010 at 11:51 AM, Maarten buis <maartenbuis@yahoo.co.uk> wrote: > --- On Thu, 10/6/10, SURYADIPTA ROY wrote: >> However, as I look at my program now, I discover >> the source of the anomaly- my transformatrion >> was newvar=ln(1+oldvar).. that explains. > > Are there 0s in your dependent variable (oldvar)? > If there are, then you really have no choice other > than go the -glm- route. There are ways of getting > a meaningfull interpretation out of a log transformed > dependent variable, but no such way exists for the > transformation log(oldvar + some constant), and > leaving the constant out is no sollution either, as > that means that he 0s will be recoded to missing > values. This may also explain your non-normality: > is there a spike at 0. If that is the case, than > there can be no transformation that will lead to > a normal distribution. In that case you could > consider modeling the zero separately using -zip-. > It is usually used for counts, but can also be > used for continuous variables in a Quasi-likelihood > kind of way, by specifying the -robust- option. > > Hope this helps, > Maarten > > -------------------------- > Maarten L. Buis > Institut fuer Soziologie > Universitaet Tuebingen > Wilhelmstrasse 36 > 72074 Tuebingen > Germany > > http://www.maartenbuis.nl > -------------------------- > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Right skewed (positive) dependent variable***From:*SURYADIPTA ROY <sroy9163@gmail.com>

**Re: st: Right skewed (positive) dependent variable***From:*Maarten buis <maartenbuis@yahoo.co.uk>

- Prev by Date:
**RE: st: Multistage sampling svyset** - Next by Date:
**RE: st: Multistage sampling svyset** - Previous by thread:
**RE: st: Right skewed (positive) dependent variable** - Next by thread:
**st: Labeling variable values in Regression Tables** - Index(es):