Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: ladder question for right-skewed variable |

Date |
Fri, 26 Apr 2013 08:49:03 +0100 |

In addition to David's good advice -- everyone should read his classic exposition Hoaglin, D.C. 1988. Transformations in everyday experience. Chance 1(4): 40--45 -- a rough analysis is possible just from the nine quantiles shown by -summarize-. For something like this I fire up Mata as a friendly calculator : y = (1,2,3,6,15.5, 82, 436.5, 1251,5953)' : strofreal((y , sqrt(y) , ln(y), -1:/y), "%3.2f") 1 2 3 4 +-----------------------------------------+ 1 | 1.00 1.00 0.00 -1.00 | 2 | 2.00 1.41 0.69 -0.50 | 3 | 3.00 1.73 1.10 -0.33 | 4 | 6.00 2.45 1.79 -0.17 | 5 | 15.50 3.94 2.74 -0.06 | 6 | 82.00 9.06 4.41 -0.01 | 7 | 436.50 20.89 6.08 -0.00 | 8 | 1251.00 35.37 7.13 -0.00 | 9 | 5953.00 77.16 8.69 -0.00 | +-----------------------------------------+ The analysis could be extended by adding in the 4 smallest and 4 largest too, but this is enough to give a hint. The data are all positive, so all the standard transformations are candidates. The results underline what could be guessed just by looking at the output of -summarize-. 1. Square root reduces skewness, but not by much. 2. (Negative) reciprocal just reverses the problem. 3. Logarithmic transformation looks the best bet, even though the distribution remains right skewed. The evidence of the 4 largest values is that you have some outliers that are likely to remain moderate outliers on any reasonable transformation. Caveats on various levels: 1. The assumption here is that transform of quantile = quantile of transform, which is solid in principle for monotonic transforms, but the small detail is that Stata averages adjacent order statistics to estimate quantiles, so you might see some small discrepancies. 2. I've not shown you reciprocal square root, not a transformation I find attractive, _unless_ there are dimensional grounds (from physics, engineering, ...) for square rooting. The variable sounds like a count, so that is ruled out if so. 3. Symmetry of marginal distribution is not a direct assumption for much, but in practice you are likely to find analyses easier if you transform a skew variable.... 4. ... or analyse it using an appropriate -glm-.. You don't say what follows this, but -glm, link(log)- is what springs to mind. There remains a mystery of why -ladder- didn't perform for you. You don't show for -ladder- _exactly_ what you typed or _exactly_ what Stata showed by way of results, but I can't see any reason for -ladder- not to perform here. Nick njcoxstata@gmail.com On 26 April 2013 01:44, David Hoaglin <dchoaglin@gmail.com> wrote: > Gabriel, > > The ratio of the largest value to the smallest value is quite large, > so a transformation is likely to be useful. As a first step ("first > aid"), I suggest that you try the logarithm (base 10). > > Usually the context of the data plays a role in the choice of a > transformation, so that the result is meaningful. What is the nature > of disp_2000? > > With 1010 observations you should check whether the data has some > structure (e.g., two or more modes or groups), for example, by making > a histogram with a sizable number of bins (say 25 or so). If you find > structure, you will need to deal with that also. > > David Hoaglin > > On Thu, Apr 25, 2013 at 8:11 PM, Gabriel Nelson > <lgabrielnelson@gmail.com> wrote: >> I have a variable that is right-skewed. I used the the ladder command >> to see suggested transformations. However, no transformations appeared >> in the output. I'm guessing that this does not mean the raw form is >> better, because there is an option for 'raw' on this list. >> >> Here is the output for the sum, detail command for the variable: >> >> >> >> sum disp_2000, detail >> >> Number displaced 2000 (if data unavailable go up >> to 2003 >> ------------------------------------------------------------- >> >> Percentiles Smallest >> 1% 1 1 >> 5% 2 1 >> 10% 3 1 Obs 1010 >> 25% 6 1 Sum of Wgt. 1010 >> >> >> 50% 15.5 Mean 281.5297 >> Largest Std. Dev. 1217.168 >> 75% 82 9421 >> 90% 436.5 9505 Variance 1481497 >> >> 95% 1251 16255 Skewness 9.012044 >> 99% 5953 19569 Kurtosis 108.8061 >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: ladder question for right-skewed variable***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: ladder question for right-skewed variable***From:*Gabriel Nelson <lgabrielnelson@gmail.com>

**Re: st: ladder question for right-skewed variable***From:*David Hoaglin <dchoaglin@gmail.com>

- Prev by Date:
**st: 2013 Nordic and Baltic Stata Users Group meeting: call for presentations** - Next by Date:
**Re: st: FW: vecrank yields "too many literals" error in small Stata** - Previous by thread:
**Re: st: ladder question for right-skewed variable** - Next by thread:
**Re: st: ladder question for right-skewed variable** - Index(es):