Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Right skewed (positive) dependent variable |

Date |
Thu, 10 Jun 2010 11:35:38 -0700 |

Agreed. The point of the email was that outliers can affect the ladder routine as well as BoxCox. If I really was concerned about something like this, I'd consider a logit transformation or some such. One doesn't want to use a bulldozer to plant a daisy... Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Thursday, June 10, 2010 9:15 AM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Right skewed (positive) dependent variable In principle this variable looks bounded by 0 and 100. In practice, that may not bite here, or perhaps even often, but it's important to note that -ladder- and friends have no intelligence to detect bounded variables and no scope for doing something special with such variables. Nick n.j.cox@durham.ac.uk Lachenbruch, Peter There is also the issue of the effect of outliers on ladder or boxcox. I just had my class grades obtained. Here are some results totalscore ------------------------------------------------------------- Percentiles Smallest 1% 26.74 26.74 5% 41.942 35.95 10% 49.298 38.95 Obs 66 25% 53.424 41.942 Sum of Wgt. 66 50% 61.756 Mean 60.36615 Largest Std. Dev. 10.349 75% 68.222 73.938 90% 72.964 77.066 Variance 107.1017 95% 73.938 79.508 Skewness -.590668 99% 80.4 80.4 Kurtosis 3.668101 * The 26.74 is from a student who did not take the final and is likely an outlier. . ladder totalscore Transformation formula chi2(2) P(chi2) ------------------------------------------------------------------ cubic totals~e^3 2.13 0.345 square totals~e^2 0.02 0.992 identity totals~e 5.68 0.058 square root sqrt(totals~e) 12.30 0.002 log log(totals~e) 21.22 0.000 1/(square root) 1/sqrt(totals~e) 31.54 0.000 inverse 1/totals~e 42.27 0.000 1/square 1/(totals~e^2) 61.95 0.000 1/cubic 1/(totals~e^3) . 0.000 * This suggests that the best transformation is a square to totalscore. I don't regard this as a happy situation. So I exclude the low score. . ladder totalscore if totalscore>30 Transformation formula chi2(2) P(chi2) ------------------------------------------------------------------ cubic totals~e^3 2.96 0.228 square totals~e^2 0.70 0.705 identity totals~e 0.77 0.681 square root sqrt(totals~e) 2.95 0.228 log log(totals~e) 6.54 0.038 1/(square root) 1/sqrt(totals~e) 11.18 0.004 inverse 1/totals~e 16.76 0.000 1/square 1/(totals~e^2) 29.23 0.000 1/cubic 1/(totals~e^3) 41.40 0.000 * Now the square and identity are about the same - I'd go with the identity. For grading purposes, the centile command would give me a simple way of finding cutoffs - in fact, I had gone through the grades manually and came up with a set of letter grades that seemed to match the centiles pretty well. In my experience, students sort themselves into natural groups. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Right skewed (positive) dependent variable***From:*SURYADIPTA ROY <sroy9163@gmail.com>

**Re: st: Right skewed (positive) dependent variable***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**RE: st: Right skewed (positive) dependent variable***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**RE: st: Right skewed (positive) dependent variable***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Multistage sampling svyset** - Next by Date:
**re: st: AW: Labeling variable values in Regression Tables** - Previous by thread:
**RE: st: Right skewed (positive) dependent variable** - Next by thread:
**Re: st: Right skewed (positive) dependent variable** - Index(es):