Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: boxplot whiskers with -lv- versus -adjacent- |

Date |
Tue, 11 May 2010 16:08:46 +0100 |

With the usual J.W. Tukey rule, the ends of the whiskers should be not [Q1 - 1.5 IQR, Q3 + 1.5 IQR] but the most extreme observed data values within that interval. Typically, data may well fall a bit, or even a lot, short on either or both sides. For example, it is even possible that the maximum is the same as the computed upper quartile or the minimum is the same as the computed lower quartile, if there are ties. Even without ties, that interval above could easily include _all_ the data values. Of course, if you have data points exactly 1.5 IQR from both quartiles, there will be no difference. -adjacent- is a user-written command from SSC. The author is a close contact and tells me that its intent was to match the above. I don't think the help for -graph box- implies either way, but the help for -adjacent- does not give the definition you imply. -lv- is quite different here: its intent is not to match boxplots, but to show you the fences, which don't necessarily correspond to boxplot whisker ends. Consider this example for the auto data. . lv price # 69 Price --------------------------------- M 35 | 5,079 | spread pseudosigma F 18 | 4,195 5,249 6,303 | 2,108 1,598.33 E 9.5 | 3,989.5 7,021.25 10,053 | 6,063.5 2,716.63 D 5 | 3,798 7,896.5 11,995 | 8,197 2,739.37 C 3 | 3,667 8,630.5 13,594 | 9,927 2,806.1 B 2 | 3,299 8,899.5 14,500 | 11,201 2,833.27 A 1.5 | 3,295 9,249 15,203 | 11,908 2,802.94 1 | 3,291 9,598.5 15,906 | 12,615 2,712.02 | | | | # below # above inner fence | 1,033 9,465 | 0 11 outer fence | -2,129 12,627 | 0 4 The corresponding box plot does not show any negative whisker end. None of these commands named above makes any use of a percentile criterion for whiskers. In short, this all looks to me a matter of misunderstandings. Nick n.j.cox@durham.ac.uk William M. Doerner Does anybody know what the -graph box- and -adjacent- commands are computing for their whiskers? They aren't using the typical +-1.5*IQR formula as listed in the help files and used by -lv-. The command -adjacent- adjusts the fence for min/max values, but the adjacent values are not lower=Q1-1.5*IQR and upper=Q3+1.5*IQR. The command -lv- computes the adjacent values with that formula, but it does not adjust the fence for min/max values. I am puzzled. I looked at the code, but I couldn't figure it out why the commands have different outputs. Here is what I was running: **BEGIN** use http://www.stata-press.com/data/r11/bplong, clear *graph box bp, over(when) over(sex) keep if sex==1 & when==2 summarize bp, detail local u=r(p75)+(3/2)*(r(p75)-r(p25)) local l=r(p25)-(3/2)*(r(p75)-r(p25)) local l=max(`l',r(min)) local u=min(`u',r(max)) di `u' di `l' adjacent bp lv bp graph box bp, ylabel(#50, angle(horizontal)) *Notice how the upper limit should be 173 instead of 169. **END** The difference between the commands is not as simple as "it's 5% and 95%." That looks true for the bp dataset, but it doesn't happen with the city temperature dataset. Here is more code to compare: **BEGIN** use http://www.stata-press.com/data/r11/bplong, clear local x "if sex==1 & when==2 & bp~=." summarize bp `x', detail adjacent bp `x' lv bp `x' sysuse citytemp.dta, clear summarize tempjuly, detail adjacent tempjuly lv tempjuly **END** * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: boxplot whiskers with -lv- versus -adjacent-***From:*"William M. Doerner" <wmd07c@fsu.edu>

- Prev by Date:
**Re: st: RE: fitted values in xtmepoisson and xtpoisson** - Next by Date:
**st: correct for the inclusion of generated regressors in survival analysis** - Previous by thread:
**st: boxplot whiskers with -lv- versus -adjacent-** - Next by thread:
**st: correct for the inclusion of generated regressors in survival analysis** - Index(es):