Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Scale break in box plot


From   David Hoaglin <dchoaglin@gmail.com>
To   Rakesh Ghosh <rakeshgh@usc.edu>
Subject   Re: st: Scale break in box plot
Date   Mon, 23 Dec 2013 15:39:37 -0500

Dear Rakesh,

Any choice of constant as the multiplier of the IQR that defines the
"fences" in a boxplot is to some degree arbitrary.  John Tukey chose
1.5 in part because of its simplicity but also on the basis of
considerable experience.  The 1.5 is part of the definition of the
standard boxplot.

I systematically discourage the use of versions of the boxplot other
than the standard one.  (The exceptions are "enhanced" displays that
look clearly different from the usual boxplot, so that people viewing
them realize that they need to ask about the details.)  For some
reason, people think they can modify the boxplot in whatever way they
want without changing its general appearance.  When that happens,
viewers can't be sure what they are looking at or how to interpret the
display.

You have not given enough information on the shape of your data for me
to judge whether a transformation would be useful.  The choice of
scale (raw, log, square root, or some other) often has a substantial
impact on the number of data values that are plotted individually at
the high and low ends.  If taking logs does not change the shape of
your data in a useful way (e.g., by making the batch more nearly
symmetric), then a logarithmic scale would not be a good choice.  I
saw "trafficdensity" and suggested logs because that transformation is
helpful for various kinds of density data.

David Hoaglin

On Mon, Dec 23, 2013 at 12:30 PM, Rakesh Ghosh <rakeshgh@usc.edu> wrote:
> Dear David
>
> Thanks for the insights. I certainly agree that I need to look into the data more carefully. To me it seems that 1.5 times criteria is a bit arbitrary, at least in my case of traffic density as an exposure. I have looked into the data and instead of presenting the 1.5 times I will present it as 1% and 99% whiskers. I don't prefer to convert it to a log scale because that will make the graph difficult to read. Moreover, I am not fitting a regression model so I have the flexibility to keep it in the original scale.
>
> Thank you and I appreciate you taking time to repond to my question.
>
> Regards
> Rakesh

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index