Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Right skewed (positive) dependent variable

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	RE: st: Right skewed (positive) dependent variable
Date	Thu, 10 Jun 2010 17:15:27 +0100

In principle this variable looks bounded by 0 and 100. 

In practice, that may not bite here, or perhaps even often, but it's
important to note that -ladder- and friends have no intelligence to
detect bounded variables and no scope for doing something special with
such variables. 

Nick 
[email protected] 

Lachenbruch, Peter

There is also the issue of the effect of outliers on ladder or boxcox.
I just had my class grades obtained.    Here are some results

                         totalscore
-------------------------------------------------------------
      Percentiles      Smallest
 1%        26.74          26.74
 5%       41.942          35.95
10%       49.298          38.95       Obs                  66
25%       53.424         41.942       Sum of Wgt.          66

50%       61.756                      Mean           60.36615
                        Largest       Std. Dev.        10.349
75%       68.222         73.938
90%       72.964         77.066       Variance       107.1017
95%       73.938         79.508       Skewness       -.590668
99%         80.4           80.4       Kurtosis       3.668101

* The 26.74 is from a student who did not take the final and is likely
an outlier.

. ladder totalscore

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.13        0.345
square                 totals~e^2              0.02        0.992
identity               totals~e                5.68        0.058
square root            sqrt(totals~e)         12.30        0.002
log                    log(totals~e)          21.22        0.000
1/(square root)        1/sqrt(totals~e)       31.54        0.000
inverse                1/totals~e             42.27        0.000
1/square               1/(totals~e^2)         61.95        0.000
1/cubic                1/(totals~e^3)             .        0.000

* This suggests that the best transformation is a square to totalscore.
I don't regard this as a happy situation.  So I exclude the low score.

. ladder totalscore if totalscore>30

Transformation         formula               chi2(2)       P(chi2)
------------------------------------------------------------------
cubic                  totals~e^3              2.96        0.228
square                 totals~e^2              0.70        0.705
identity               totals~e                0.77        0.681
square root            sqrt(totals~e)          2.95        0.228
log                    log(totals~e)           6.54        0.038
1/(square root)        1/sqrt(totals~e)       11.18        0.004
inverse                1/totals~e             16.76        0.000
1/square               1/(totals~e^2)         29.23        0.000
1/cubic                1/(totals~e^3)         41.40        0.000

* Now the square and identity are about the same - I'd go with the
identity.  For grading purposes, the centile command would give me a
simple way of finding cutoffs - in fact, I had gone through the grades
manually and came up with a set of letter grades that seemed to match
the centiles pretty well.   In my experience, students sort themselves
into natural groups.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: Right skewed (positive) dependent variable
  - From: "Lachenbruch, Peter" <[email protected]>

References:
- Re: st: Right skewed (positive) dependent variable
  - From: SURYADIPTA ROY <[email protected]>
- Re: st: Right skewed (positive) dependent variable
  - From: Maarten buis <[email protected]>
- RE: st: Right skewed (positive) dependent variable
  - From: "Lachenbruch, Peter" <[email protected]>

Prev by Date: Re: st: AW: mfx-Elasticity for a dummy variable
Next by Date: st: compare median survival times?
Previous by thread: RE: st: Right skewed (positive) dependent variable
Next by thread: RE: st: Right skewed (positive) dependent variable
Index(es):
- Date
- Thread