Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Scale break in box plot


From   David Hoaglin <dchoaglin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Scale break in box plot
Date   Mon, 16 Dec 2013 22:37:22 -0500

Dear Rakesh,

If you would like to insert a break in the scale, my reaction (based
on more than 40 years of experience with boxplots) is that the data
may be suggesting that you do something different.

Observations that are plotted individually at the ends of a boxplot
are not necessarily "outliers."  In samples of well-behaved data
(i.e., from a normal distribution), the standard definition of the
boxplot causes observations to be plotted individually more often than
if they were truly outliers.  Hoaglin et al. (1986) and Hoaglin and
Iglewicz (1987) give some further information.  In Exploratory Data
Analysis such observations are simply referred to as "outside."  The
idea is to give them special attention, to see whether some reason
accounts for their being "outside."

It would be helpful to know more about the data on which your boxplot is based:
What is the variable?
How many observations?
How many observations are "outside" at the low end?
How many observations are "outside" at the high end?

If, for example, all the "outside" observations are at the high end,
and they seem to be part of a skewed pattern, you may want to consider
applying a transformation, such as the logarithm or the square root.

I hope this information is helpful.

David Hoaglin

Hoaglin DC, Iglewicz B, Tukey JW (1986).  Performance of some
resistant rules for outlier labeling.  Journal of the American
Statistical Association 81:991-999.

Hoaglin DC, Iglewicz B (1987).  Fine-tuning some resistant rules for
outlier labeling.  Journal of the American Statistical Association
82:1147-1149.


On Mon, Dec 16, 2013 at 2:24 PM, Rakesh Ghosh <rakeshgh@usc.edu> wrote:
>>>> Dear Stata list members
>>>>
>>>> I have a box plot with many outliers. I would like to insert a scale break to increase the box size and reduce the span of the outliers. I tried both of the options in this Stata scale break link (http://www.stata.com/support/faqs/graphics/scale-breaks/). While inserting a line will not work in my case because I have no break in data points, the second option does work when I create a box plot and a scatter plot and then combine them together.
>>>
>>>> -graph box trafficdensity if trafficdensity>0 & trafficdensity<=125, over(county)-
>>>>
>>>> However, the median, p25 and p75 are underestimated because I restrict the upper limit of the box plot, so it is not good for me. I will have to restrict the upper limit otherwise I will not get the plot of desirable size. Is there any way you can think how I can insert a break on the y axis?
>>>>
>>>> Thanks for any suggestion.
>>>>
>>>> Rakesh Ghosh

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index