Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: ladder question for right-skewed variable

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: ladder question for right-skewed variable
Date	Fri, 26 Apr 2013 19:45:46 +0100

Three assertions based on a mix of experience and prejudice:

1. The best way to check for normality is with -qnorm-. Even if
normality is not your reference case, asymmetry will show up clearly
on a -qnorm- graph.

2. 90% of the time, choosing transformations boils down to whether
three possible transformations are any use, root, logarithm or
reciprocal.

3. So, do-it-yourself is easy:

gen rtmyvar = sqrt(myvar)
gen logmyvar = log(myvar)
gen recmyvar = 1/myvar

qnorm myvar, name(a)
qnorm rtmyvar, name(b)
qnorm logmyvar, name(c)
qnorm recmyvar, name(d)

Not universally known fact: Giving a name to a graph means that it
sticks around until _you_ close it. So, you have four graphs on your
monitor. Arrange them with your mouse so you can compare. Usually it's
easy to pick what works best, without any formal machinery.

(Yes, I know about -gladder-, but this is simpler in practice.)


Nick
[email protected]


On 26 April 2013 19:20, Nick Cox <[email protected]> wrote:
> Just to underline that kurtosis in your variable was calculated by
> -summarize- 108. That's BIG. No wonder -sktest- can't cope.
> Nick
> [email protected]
>
>
> On 26 April 2013 19:17, Nick Cox <[email protected]> wrote:
>> That's not quite "no transformations appeared in the output" as
>> -ladder- is signalling P-values for some cases.
>>
>> But I readily agree that -ladder- is not doing a good job here at all.
>>
>> In fact, I am now reminded of evident -ladder- problems shown in a
>> recent thread starting at
>> http://www.stata.com/statalist/archive/2013-02/msg00862.html
>>
>> I can't find a public email, even though I thought I posted on this,
>> but my impression from looking at the code is that -ladder- is
>> essentially fragile. The real problem here is within -sktest-. It can
>> break down, it seems, for large sample sizes and/or large deviations
>> from Gaussianity. Then it bounces back missings.
>>
>> I think you just need to abandon -ladder-. It's not essential. You
>> don't need _any_ test to tell you that some transformation will help
>> if the goal is to reduce asymmetry, and there are only a few credible
>> alternatives.
>>
>> As David and I pointed out, log transformation should work quite well
>> for your data,
>>
>> but but but: (my suggestion; David may not agree) why transform at
>> all? Your solutions start with -poisson- (or, for consenting adults,
>> -nbreg-).
>>
>> BTW, -ladder- is a command, not a function, and in Stata ne'er the
>> twain shall meet.
>>
>> Nick
>> [email protected]
>>
>>
>> On 26 April 2013 18:55, Gabriel Nelson <[email protected]> wrote:
>>> Thanks Nick, yes exactly, my question is why the ladder function fails
>>> to provide any chi-square values here. I'll attach the Stata output
>>> here:
>>>
>>> . ladder disp_2000
>>>
>>> Transformation         formula               chi2(2)       P(chi2)
>>> ------------------------------------------------------------------
>>> cubic                  dis~2000^3                 .            .
>>> square                 dis~2000^2                 .            .
>>> identity               dis~2000                   .            .
>>> square root            sqrt(dis~2000)             .        0.000
>>> log                    log(dis~2000)              .        0.000
>>> 1/(square root)        1/sqrt(dis~2000)           .        0.000
>>> inverse                1/dis~2000                 .        0.000
>>> 1/square               1/(dis~2000^2)             .        0.000
>>> 1/cubic                1/(dis~2000^3)             .        0.000
>>>
>>> . sum disp_2000, detail
>>>
>>>       Number displaced 2000 (if data unavailable go up
>>>                            to 2003
>>> -------------------------------------------------------------
>>>       Percentiles      Smallest
>>>  1%            1              1
>>>  5%            2              1
>>> 10%            3              1       Obs                1010
>>> 25%            6              1       Sum of Wgt.        1010
>>>
>>> 50%         15.5                      Mean           281.5297
>>>                         Largest       Std. Dev.      1217.168
>>> 75%           82           9421
>>> 90%        436.5           9505       Variance        1481497
>>> 95%         1251          16255       Skewness       9.012044
>>> 99%         5953          19569       Kurtosis       108.8061
>>>
>>> On Fri, Apr 26, 2013 at 10:47 AM, Nick Cox <[email protected]> wrote:
>>>> Please see my answers too. You have still not given the exact -ladder-
>>>> command you used or its output, so it is really difficult to know what
>>>> is going on.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: st :Endogenous variables in Survival analysis
  - From: Ayman Farahat <[email protected]>
- Re: st: ladder question for right-skewed variable
  - From: Gabriel Nelson <[email protected]>

References:
- st: ladder question for right-skewed variable
  - From: Gabriel Nelson <[email protected]>
- Re: st: ladder question for right-skewed variable
  - From: David Hoaglin <[email protected]>
- Re: st: ladder question for right-skewed variable
  - From: Gabriel Nelson <[email protected]>
- Re: st: ladder question for right-skewed variable
  - From: Nick Cox <[email protected]>
- Re: st: ladder question for right-skewed variable
  - From: Gabriel Nelson <[email protected]>
- Re: st: ladder question for right-skewed variable
  - From: Nick Cox <[email protected]>
- Re: st: ladder question for right-skewed variable
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: ladder question for right-skewed variable
Next by Date: st: adding different text to each of a set of graphs
Previous by thread: Re: st: ladder question for right-skewed variable
Next by thread: Re: st: ladder question for right-skewed variable
Index(es):
- Date
- Thread