Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: boxplot whiskers with -lv- versus -adjacent-


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: boxplot whiskers with -lv- versus -adjacent-
Date   Tue, 11 May 2010 16:08:46 +0100

With the usual J.W. Tukey rule, the ends of the whiskers should be not 

[Q1 - 1.5 IQR, Q3 + 1.5 IQR] 

but the most extreme observed data values within that interval. Typically, data may well fall a bit, or even a lot, short on either or both sides. For example, it is even possible that the maximum is the same as the computed upper quartile or the minimum is  the same as the computed lower quartile, if there are ties. Even without ties, that interval above could easily include _all_ the data values. 

Of course, if you have data points exactly 1.5 IQR from both quartiles, there will be no difference. 

-adjacent- is a user-written command from SSC. The author is a close contact and tells me that its intent was to match the above. 

I don't think the help for -graph box- implies either way, but the help for -adjacent- does not give the definition you imply.

-lv- is quite different here: its intent is not to match boxplots, but to show you the fences, which don't necessarily correspond to boxplot whisker ends. Consider this example for the auto data. 

. lv price

 #     69                  Price
             ---------------------------------
 M     35   |                5,079            |    spread  pseudosigma
 F     18   |     4,195      5,249      6,303 |     2,108    1,598.33
 E      9.5 |   3,989.5   7,021.25     10,053 |   6,063.5    2,716.63
 D      5   |     3,798    7,896.5     11,995 |     8,197    2,739.37
 C      3   |     3,667    8,630.5     13,594 |     9,927     2,806.1
 B      2   |     3,299    8,899.5     14,500 |    11,201    2,833.27
 A      1.5 |     3,295      9,249     15,203 |    11,908    2,802.94
        1   |     3,291    9,598.5     15,906 |    12,615    2,712.02
            |                                 |
            |                                 |   # below     # above
inner fence |     1,033                 9,465 |         0          11
outer fence |    -2,129                12,627 |         0           4

The corresponding box plot does not show any negative whisker end. 

None of these commands named above makes any use of a percentile criterion for whiskers. 

In short, this all looks to me a matter of misunderstandings. 

Nick 
n.j.cox@durham.ac.uk 

William M. Doerner

Does anybody know what the -graph box- and -adjacent- commands are
computing for their whiskers?  They aren't using the typical +-1.5*IQR
formula as listed in the help files and used by -lv-.

The command -adjacent- adjusts the fence for min/max values, but the
adjacent values are not lower=Q1-1.5*IQR and upper=Q3+1.5*IQR.  The
command -lv- computes the adjacent values with that formula, but it
does not adjust the fence for min/max values.  I am puzzled.  I looked
at the code, but I couldn't figure it out why the commands have
different outputs.  Here is what I was running:

**BEGIN**
use http://www.stata-press.com/data/r11/bplong, clear
*graph box bp, over(when) over(sex)
keep if sex==1 & when==2
summarize bp, detail
local u=r(p75)+(3/2)*(r(p75)-r(p25))
local l=r(p25)-(3/2)*(r(p75)-r(p25))
local l=max(`l',r(min))
local u=min(`u',r(max))

di `u'
di `l'

adjacent bp
lv bp
graph box bp, ylabel(#50, angle(horizontal))
*Notice how the upper limit should be 173 instead of 169.
**END**

The difference between the commands is not as simple as "it's 5% and
95%."  That looks true for the bp dataset, but it doesn't happen with
the city temperature dataset.  Here is more code to compare:

**BEGIN**
use http://www.stata-press.com/data/r11/bplong, clear
local x "if sex==1 & when==2 & bp~=."
summarize bp `x', detail
adjacent bp `x'
lv bp `x'

sysuse citytemp.dta, clear
summarize tempjuly, detail
adjacent tempjuly
lv tempjuly
**END**


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index