Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Right skewed (positive) dependent variable |

Date |
Thu, 10 Jun 2010 09:05:20 -0700 |

There is also the issue of the effect of outliers on ladder or boxcox. I just had my class grades obtained. Here are some results totalscore ------------------------------------------------------------- Percentiles Smallest 1% 26.74 26.74 5% 41.942 35.95 10% 49.298 38.95 Obs 66 25% 53.424 41.942 Sum of Wgt. 66 50% 61.756 Mean 60.36615 Largest Std. Dev. 10.349 75% 68.222 73.938 90% 72.964 77.066 Variance 107.1017 95% 73.938 79.508 Skewness -.590668 99% 80.4 80.4 Kurtosis 3.668101 * The 26.74 is from a student who did not take the final and is likely an outlier. . ladder totalscore Transformation formula chi2(2) P(chi2) ------------------------------------------------------------------ cubic totals~e^3 2.13 0.345 square totals~e^2 0.02 0.992 identity totals~e 5.68 0.058 square root sqrt(totals~e) 12.30 0.002 log log(totals~e) 21.22 0.000 1/(square root) 1/sqrt(totals~e) 31.54 0.000 inverse 1/totals~e 42.27 0.000 1/square 1/(totals~e^2) 61.95 0.000 1/cubic 1/(totals~e^3) . 0.000 * This suggests that the best transformation is a square to totalscore. I don't regard this as a happy situation. So I exclude the low score. . ladder totalscore if totalscore>30 Transformation formula chi2(2) P(chi2) ------------------------------------------------------------------ cubic totals~e^3 2.96 0.228 square totals~e^2 0.70 0.705 identity totals~e 0.77 0.681 square root sqrt(totals~e) 2.95 0.228 log log(totals~e) 6.54 0.038 1/(square root) 1/sqrt(totals~e) 11.18 0.004 inverse 1/totals~e 16.76 0.000 1/square 1/(totals~e^2) 29.23 0.000 1/cubic 1/(totals~e^3) 41.40 0.000 * Now the square and identity are about the same - I'd go with the identity. For grading purposes, the centile command would give me a simple way of finding cutoffs - in fact, I had gone through the grades manually and came up with a set of letter grades that seemed to match the centiles pretty well. In my experience, students sort themselves into natural groups. Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Maarten buis Sent: Thursday, June 10, 2010 8:51 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Right skewed (positive) dependent variable --- On Thu, 10/6/10, SURYADIPTA ROY wrote: > However, as I look at my program now, I discover > the source of the anomaly- my transformatrion > was newvar=ln(1+oldvar).. that explains. Are there 0s in your dependent variable (oldvar)? If there are, then you really have no choice other than go the -glm- route. There are ways of getting a meaningfull interpretation out of a log transformed dependent variable, but no such way exists for the transformation log(oldvar + some constant), and leaving the constant out is no sollution either, as that means that he 0s will be recoded to missing values. This may also explain your non-normality: is there a spike at 0. If that is the case, than there can be no transformation that will lead to a normal distribution. In that case you could consider modeling the zero separately using -zip-. It is usually used for counts, but can also be used for continuous variables in a Quasi-likelihood kind of way, by specifying the -robust- option. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Right skewed (positive) dependent variable***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**Re: st: Right skewed (positive) dependent variable***From:*SURYADIPTA ROY <sroy9163@gmail.com>

**Re: st: Right skewed (positive) dependent variable***From:*Maarten buis <maartenbuis@yahoo.co.uk>

- Prev by Date:
**re: st: AW: Labeling variable values in Regression Tables** - Next by Date:
**Re: st: AW: mfx-Elasticity for a dummy variable** - Previous by thread:
**Re: st: Right skewed (positive) dependent variable** - Next by thread:
**RE: st: Right skewed (positive) dependent variable** - Index(es):