Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Definition of "outside" in box plots - new reference


From   "Allan Reese (Cefas)" <allan.reese@cefas.co.uk>
From   "Jens Lauritsen" <jl@epidata.dk>, quotes@hsph.harvard.edu
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Definition of "outside" in box plots - new reference
Date   Wed, 24 May 2006 14:42:39 +0100

"Outlier Labeling With Boxplot Procedures"
C. H. SIM, F. F. GAN, and T. C. CHANG.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 100 (470): 642-652 JUN 2005

"... We recommend that the graphical
boxplot be constructed based on the knowledge of the underlying
distribution of the dataset and by controling the risk of labeling regular
observations as outliers."
------------

I raised this last year with StataCorp and got a positive reply.  The messages are appended to help Jens' researches.
(R A Reese. Toolkit: boxplots.  Significance Vol 2 (2005) issue 3 134-135.)

Sim and friends seem to have lost the plot, in that B&Ws are a visual way to examine data, not a significance test.  If you *know* what the distribution is, why plot the data?  Real data never actually come from these neat, exact, distributions, so we need a flag to direct attention to results that merit further investigation.

Allan
------------

19/04/05
I'm writing about boxplots and checked Tukey (EDA 1977) against the Stata 8 Graphics book.
Tukey writes (p44) "the value at each end closest to, but still inside, the inner fence is "adjacent"
Stata writes (p159) "The upper adjacent value is defined as x_i such that x_i <=U

My reading is that Tukey suggests < rather than <=.  The fences define, on a normal distribution, the upper and lower 0.5 percentiles, so are quite generous.  So Stata would not mark points on the fences as outliers, but Tukey would draw shorter whiskers and would mark more outliers for the same data.

I've tried Stata, SPSS and SAS, all with the trivial dataset
-268 -67 0 67 268

All draw the whisker to 268, but change 268 to 268.00001 and no whisker is drawn; the upper value is marked as an outlier.  Stata documentation is inconsistent with Tukey (1977), the other packages claim to follow his rule but clearly do not.  On that basis, Stata has a feature, the others have a bug!


Allan,

You are correct, the documentation is inconsistent with Tukey.  If it hasn't
been done already, I'll submit it for a manual change.  Thanks for bringing
this to our attention.
Sincerely,

Derek Wagner


***********************************************************************************
This email and any attachments are intended for the named recipient only.  Its unauthorised use, distribution, disclosure, storage or copying is not permitted.  If you have received it in error, please destroy all copies and notify the sender.  In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent.  All emails may be subject to monitoring.
***********************************************************************************


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index