Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
David Hoaglin <dchoaglin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Scale break in box plot |

Date |
Mon, 16 Dec 2013 22:37:22 -0500 |

Dear Rakesh, If you would like to insert a break in the scale, my reaction (based on more than 40 years of experience with boxplots) is that the data may be suggesting that you do something different. Observations that are plotted individually at the ends of a boxplot are not necessarily "outliers." In samples of well-behaved data (i.e., from a normal distribution), the standard definition of the boxplot causes observations to be plotted individually more often than if they were truly outliers. Hoaglin et al. (1986) and Hoaglin and Iglewicz (1987) give some further information. In Exploratory Data Analysis such observations are simply referred to as "outside." The idea is to give them special attention, to see whether some reason accounts for their being "outside." It would be helpful to know more about the data on which your boxplot is based: What is the variable? How many observations? How many observations are "outside" at the low end? How many observations are "outside" at the high end? If, for example, all the "outside" observations are at the high end, and they seem to be part of a skewed pattern, you may want to consider applying a transformation, such as the logarithm or the square root. I hope this information is helpful. David Hoaglin Hoaglin DC, Iglewicz B, Tukey JW (1986). Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association 81:991-999. Hoaglin DC, Iglewicz B (1987). Fine-tuning some resistant rules for outlier labeling. Journal of the American Statistical Association 82:1147-1149. On Mon, Dec 16, 2013 at 2:24 PM, Rakesh Ghosh <rakeshgh@usc.edu> wrote: >>>> Dear Stata list members >>>> >>>> I have a box plot with many outliers. I would like to insert a scale break to increase the box size and reduce the span of the outliers. I tried both of the options in this Stata scale break link (http://www.stata.com/support/faqs/graphics/scale-breaks/). While inserting a line will not work in my case because I have no break in data points, the second option does work when I create a box plot and a scatter plot and then combine them together. >>> >>>> -graph box trafficdensity if trafficdensity>0 & trafficdensity<=125, over(county)- >>>> >>>> However, the median, p25 and p75 are underestimated because I restrict the upper limit of the box plot, so it is not good for me. I will have to restrict the upper limit otherwise I will not get the plot of desirable size. Is there any way you can think how I can insert a break on the y axis? >>>> >>>> Thanks for any suggestion. >>>> >>>> Rakesh Ghosh * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Scale break in box plot***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Scale break in box plot***From:*Rakesh Ghosh <rakeshgh@usc.edu>

- Prev by Date:
**Re: st: Labels in spmap** - Next by Date:
**Re: st: new package -smvcir- available in SSC** - Previous by thread:
**Re: st: Scale break in box plot** - Next by thread:
**Re: st: Scale break in box plot** - Index(es):