Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: Right skewed (positive) dependent variable

 From "Nick Cox" To Subject RE: st: Right skewed (positive) dependent variable Date Thu, 10 Jun 2010 17:15:27 +0100

```In principle this variable looks bounded by 0 and 100.

In practice, that may not bite here, or perhaps even often, but it's
important to note that -ladder- and friends have no intelligence to
detect bounded variables and no scope for doing something special with
such variables.

Nick
n.j.cox@durham.ac.uk

Lachenbruch, Peter

There is also the issue of the effect of outliers on ladder or boxcox.

totalscore
-------------------------------------------------------------
Percentiles      Smallest
1%        26.74          26.74
5%       41.942          35.95
10%       49.298          38.95       Obs                  66
25%       53.424         41.942       Sum of Wgt.          66

50%       61.756                      Mean           60.36615
Largest       Std. Dev.        10.349
75%       68.222         73.938
90%       72.964         77.066       Variance       107.1017
95%       73.938         79.508       Skewness       -.590668
99%         80.4           80.4       Kurtosis       3.668101

* The 26.74 is from a student who did not take the final and is likely
an outlier.

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.13        0.345
square                 totals~e^2              0.02        0.992
identity               totals~e                5.68        0.058
square root            sqrt(totals~e)         12.30        0.002
log                    log(totals~e)          21.22        0.000
1/(square root)        1/sqrt(totals~e)       31.54        0.000
inverse                1/totals~e             42.27        0.000
1/square               1/(totals~e^2)         61.95        0.000
1/cubic                1/(totals~e^3)             .        0.000

* This suggests that the best transformation is a square to totalscore.
I don't regard this as a happy situation.  So I exclude the low score.

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.96        0.228
square                 totals~e^2              0.70        0.705
identity               totals~e                0.77        0.681
square root            sqrt(totals~e)          2.95        0.228
log                    log(totals~e)           6.54        0.038
1/(square root)        1/sqrt(totals~e)       11.18        0.004
inverse                1/totals~e             16.76        0.000
1/square               1/(totals~e^2)         29.23        0.000
1/cubic                1/(totals~e^3)         41.40        0.000

* Now the square and identity are about the same - I'd go with the
identity.  For grading purposes, the centile command would give me a
simple way of finding cutoffs - in fact, I had gone through the grades
manually and came up with a set of letter grades that seemed to match
the centiles pretty well.   In my experience, students sort themselves
into natural groups.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```