Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
David Hoaglin <dchoaglin@gmail.com> |

To |
Rakesh Ghosh <rakeshgh@usc.edu> |

Subject |
Re: st: Scale break in box plot |

Date |
Mon, 23 Dec 2013 15:39:37 -0500 |

Dear Rakesh, Any choice of constant as the multiplier of the IQR that defines the "fences" in a boxplot is to some degree arbitrary. John Tukey chose 1.5 in part because of its simplicity but also on the basis of considerable experience. The 1.5 is part of the definition of the standard boxplot. I systematically discourage the use of versions of the boxplot other than the standard one. (The exceptions are "enhanced" displays that look clearly different from the usual boxplot, so that people viewing them realize that they need to ask about the details.) For some reason, people think they can modify the boxplot in whatever way they want without changing its general appearance. When that happens, viewers can't be sure what they are looking at or how to interpret the display. You have not given enough information on the shape of your data for me to judge whether a transformation would be useful. The choice of scale (raw, log, square root, or some other) often has a substantial impact on the number of data values that are plotted individually at the high and low ends. If taking logs does not change the shape of your data in a useful way (e.g., by making the batch more nearly symmetric), then a logarithmic scale would not be a good choice. I saw "trafficdensity" and suggested logs because that transformation is helpful for various kinds of density data. David Hoaglin On Mon, Dec 23, 2013 at 12:30 PM, Rakesh Ghosh <rakeshgh@usc.edu> wrote: > Dear David > > Thanks for the insights. I certainly agree that I need to look into the data more carefully. To me it seems that 1.5 times criteria is a bit arbitrary, at least in my case of traffic density as an exposure. I have looked into the data and instead of presenting the 1.5 times I will present it as 1% and 99% whiskers. I don't prefer to convert it to a log scale because that will make the graph difficult to read. Moreover, I am not fitting a regression model so I have the flexibility to keep it in the original scale. > > Thank you and I appreciate you taking time to repond to my question. > > Regards > Rakesh * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Scale break in box plot***From:*Rakesh Ghosh <rakeshgh@usc.edu>

**Re: st: Scale break in box plot***From:*David Hoaglin <dchoaglin@gmail.com>

- Prev by Date:
**Re: st: Regression by year and industry to save coefficients and standard errors** - Next by Date:
**Re: st: Export Excel and Cell references** - Previous by thread:
**Re: st: Scale break in box plot** - Next by thread:
**st: omitting outliers** - Index(es):