# Re: st: Scale break in box plot

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Scale break in box plot Date Mon, 16 Dec 2013 22:37:22 -0500

```Dear Rakesh,

If you would like to insert a break in the scale, my reaction (based
on more than 40 years of experience with boxplots) is that the data
may be suggesting that you do something different.

Observations that are plotted individually at the ends of a boxplot
are not necessarily "outliers."  In samples of well-behaved data
(i.e., from a normal distribution), the standard definition of the
boxplot causes observations to be plotted individually more often than
if they were truly outliers.  Hoaglin et al. (1986) and Hoaglin and
Iglewicz (1987) give some further information.  In Exploratory Data
Analysis such observations are simply referred to as "outside."  The
idea is to give them special attention, to see whether some reason
accounts for their being "outside."

What is the variable?
How many observations?
How many observations are "outside" at the low end?
How many observations are "outside" at the high end?

If, for example, all the "outside" observations are at the high end,
and they seem to be part of a skewed pattern, you may want to consider
applying a transformation, such as the logarithm or the square root.

I hope this information is helpful.

David Hoaglin

Hoaglin DC, Iglewicz B, Tukey JW (1986).  Performance of some
resistant rules for outlier labeling.  Journal of the American
Statistical Association 81:991-999.

Hoaglin DC, Iglewicz B (1987).  Fine-tuning some resistant rules for
outlier labeling.  Journal of the American Statistical Association
82:1147-1149.

On Mon, Dec 16, 2013 at 2:24 PM, Rakesh Ghosh <rakeshgh@usc.edu> wrote:
>>>> Dear Stata list members
>>>>
>>>> I have a box plot with many outliers. I would like to insert a scale break to increase the box size and reduce the span of the outliers. I tried both of the options in this Stata scale break link (http://www.stata.com/support/faqs/graphics/scale-breaks/). While inserting a line will not work in my case because I have no break in data points, the second option does work when I create a box plot and a scatter plot and then combine them together.
>>>
>>>> -graph box trafficdensity if trafficdensity>0 & trafficdensity<=125, over(county)-
>>>>
>>>> However, the median, p25 and p75 are underestimated because I restrict the upper limit of the box plot, so it is not good for me. I will have to restrict the upper limit otherwise I will not get the plot of desirable size. Is there any way you can think how I can insert a break on the y axis?
>>>>
>>>> Thanks for any suggestion.
>>>>
>>>> Rakesh Ghosh

```