Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Right skewed (positive) dependent variable


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Right skewed (positive) dependent variable
Date   Thu, 10 Jun 2010 17:15:27 +0100

In principle this variable looks bounded by 0 and 100. 

In practice, that may not bite here, or perhaps even often, but it's
important to note that -ladder- and friends have no intelligence to
detect bounded variables and no scope for doing something special with
such variables. 

Nick 
n.j.cox@durham.ac.uk 

Lachenbruch, Peter

There is also the issue of the effect of outliers on ladder or boxcox.
I just had my class grades obtained.    Here are some results

                         totalscore
-------------------------------------------------------------
      Percentiles      Smallest
 1%        26.74          26.74
 5%       41.942          35.95
10%       49.298          38.95       Obs                  66
25%       53.424         41.942       Sum of Wgt.          66

50%       61.756                      Mean           60.36615
                        Largest       Std. Dev.        10.349
75%       68.222         73.938
90%       72.964         77.066       Variance       107.1017
95%       73.938         79.508       Skewness       -.590668
99%         80.4           80.4       Kurtosis       3.668101

* The 26.74 is from a student who did not take the final and is likely
an outlier.

. ladder totalscore

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.13        0.345
square                 totals~e^2              0.02        0.992
identity               totals~e                5.68        0.058
square root            sqrt(totals~e)         12.30        0.002
log                    log(totals~e)          21.22        0.000
1/(square root)        1/sqrt(totals~e)       31.54        0.000
inverse                1/totals~e             42.27        0.000
1/square               1/(totals~e^2)         61.95        0.000
1/cubic                1/(totals~e^3)             .        0.000

* This suggests that the best transformation is a square to totalscore.
I don't regard this as a happy situation.  So I exclude the low score.

. ladder totalscore if totalscore>30

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.96        0.228
square                 totals~e^2              0.70        0.705
identity               totals~e                0.77        0.681
square root            sqrt(totals~e)          2.95        0.228
log                    log(totals~e)           6.54        0.038
1/(square root)        1/sqrt(totals~e)       11.18        0.004
inverse                1/totals~e             16.76        0.000
1/square               1/(totals~e^2)         29.23        0.000
1/cubic                1/(totals~e^3)         41.40        0.000

* Now the square and identity are about the same - I'd go with the
identity.  For grading purposes, the centile command would give me a
simple way of finding cutoffs - in fact, I had gone through the grades
manually and came up with a set of letter grades that seemed to match
the centiles pretty well.   In my experience, students sort themselves
into natural groups.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index