Daphna Bassok started a thread by asking various questions on box plots. Here I edit slightly, also numbering the questions, DB1 ... DB4. DB1. Is there any way I can get labels on my box plot graphs? Guilherme Silva answered >> Supposing the variable of interest is named "xvar", >> the variable of identification - "case", and that you >> have seen just 4 outliers in a previous screening ... then to >> identify outliers (outsides in the box plot) you may type: >> . graph box xvar, medtype(line) mark(1,mlabel(case)) ... and he pointed out that the rule is a separate -mark(,)- option for each y variable. DB2. I would like to see the values of the median, 25th percentile, 75 %...etc. I answered >> Use -summarize, detail- to see the median and quartiles. DB3. I want to see/know the values of the top and bottom cut off lines. How do I find these values? I answered >> The adjacent values are the extreme data points within >> 1.5 iqr of the nearer quartile. I think you might have >> to re-create those for yourself, as -graph box- doesn't >> seem to leave them in memory. Nor should it really, >> as there could be lots of them. I also posted the code of a program -adjacent- to calculate these, and commented >> I seem to get the same values as do the box >> plot routines. Note that adjacent values >> need not be unique. More testing advisable. Ric Uslaner wrote >> I copied -adjacent- into the do file editor and >> tried to run it ... and this is what I got: >> you must specify the lname() option >> r(198); whereas Clive Nicholas reported no problem. He suggested -update q-. The message Ric was seeing was coming from official Stata -egen, group()-, which is called by -adjacent-. I am not clear why he's getting it. As far as I can see it shouldn't happen. If it persists, do flag that privately. Daphna also asked privately, and I take the liberty of echoing the question here as others may be interested: >> I am not sure I follow why the lower >> and upper adjacent values are not >> unique for a given population What I meant was that there could be ties for adjacent value. Naturally, there could also be ties even for the most extreme outliers. I have now extended -adjacent- so that it supports multiple variables in the varlist and also frequency and analytic weights. I'll send the files to Kit Baum for posting on SSC. DB4. I am interested in analyzing the outliers or outside values, but I am not able to see what the specific lower and upper cut off values are. Another program which may be of interest here is -extremes- from SSC. With the -iqr- option, or with -iqr(1.5)- you can see which observations are more than 1.5 iqr from the nearer quartile: . extremes mpg, iqr +--------------------+ | obs: iqr: mpg | |--------------------| | 59. 2.286 41 | +--------------------+ What's often more useful is to specify other variables which are included in the listing as context: . extremes mpg make, iqr +--------------------------------+ | obs: iqr: mpg make | |--------------------------------| | 59. 2.286 41 VW Diesel | +--------------------------------+ Just added to -extremes-, but not yet in the version on SSC is support for -by:-. . bysort for : extremes mpg make, iqr ------------------------------------- -> foreign = Domestic +----------------------------------+ | obs: iqr: mpg make | |----------------------------------| | 23. 2.182 34 Plym. Champ | +----------------------------------+ -------------------------------------------------------------------------------------------------- -> foreign = Foreign +--------------------------------+ | obs: iqr: mpg make | |--------------------------------| | 71. 1.857 41 VW Diesel | +--------------------------------+ Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

