[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Alternatives to boxplot+line |

Date |
Fri, 28 Sep 2007 16:28:33 +0100 |

There are issues on various levels here. I'd want to separate out 0. what works for Allan's data 1. Allan's personal graphical taste 2. collective habits or conventions 3. my program -stripplot- I can't comment on 0. On 3 there is no disagreement. -stripplot- just provides a default that can be overridden by the user if it is not appropriate. There are options in the program to jitter (courtesy -twoway-), to stack tied data points and to bin data points before plotting. All this should be evident from reading the help. On 1 and 2 we can, in the cant phrase, agree to differ. I first encountered box plots in their incarnation of dispersion diagrams as a student geographer about 40 years ago, before Tukey had even published on the subject, which makes me less inclined to swallow whole all the details of what Tukey did with this particular graph form. It is rather puzzling to me that the convention that Tukey is quoted for, to show whiskers covering data points between the quartiles and the most extreme points within 1.5 IQR of the nearer quartile, is so often treated so reverently (especially since he changed his mind on what was advisable at least once). Tukey suggested this as a rule for hand plotting when exploring small or moderate size datasets and as a way of identifying possible outliers that should be thought about individually. That this then automatically carries through to computer plotting of presentation graphics of what may be much larger datasets is not to me self-evident. Cleveland once briefly suggested plotting whiskers out to 10% and 90% percentiles. This has been echoed by various statistical software. It seems to me _easier_ to explain than the "within 1.5 IQR" rule. No matter, what -stripplot- does with its -box- option is just to superimpose a (transparent) box with median and quartiles on a strip plot of all values. I can't see how that can be any more difficult to explain or to understand than other flavours of the box plot, or that it is in any sense capricious. I didn't invent it, either. As it happens, I helped Tom Steichen on an early sunflower plot program as an interesting programming project. What Bill Dupont and Gale Plummer and then StataCorp did is much more impressive, but I have not yet really found sunflower plots more effective than alternatives. That can be put down to taste if you like, but I think that efficient mental decoding of sunflowers is problematic. With thousands of data points we are often in a different graphical league anyway. What I find myself doing increasingly is to use a grey scale colour to get the data points as a backdrop and then looking for structure in some superimposed smooth(s) of response w.r.t. predictor. As attachments are discouraged on the list, the only alternatives I can think of are (1) to post graphs on a website (2) to send instructions for similar graphs using mutually accessible datasets. Recent users' group meetings heard that the graph editor will be (shortly) more auditable (?word) than it currently is. Whatever, the graph editor in 10 is of course not compulsory! Nick n.j.cox@durham.ac.uk Allan Reese > Nick Cox, as ever, provides interesting questions and > constructive suggestions. > > I'm loath to move to v10 and the graph editor, because that > route doesn't provide an audit trail or commands for production use. > > The stripplot route would provide the nearest to the request, > with a bit more code to plot outliers. Regardless of > personal preferences, boxplots have documented conventions > (which I personally like) and it doesn't aid communication to > arbitrarily ignore them - there are enough badly-drawn graphs > in publication without deliberate tinkering. As a comment on > the plain stripplot, it's potentially misleading unless > points are jittered to show concurrencies. > > What I forgot to mention was that my boxplot does have > categories as well as a metric. The actual example show the > good linear fit over a range of concentrations, each > concentration tested in two sets of equipment. I can fudge > that by moving each set to the side of the nominal > concentration, but I'm now clearly doing a lot of one-off > coding to get the raw values, as Nick says, for rbar. > > Looking at the jittered stripplot+line, I thought a > sunflowerplot might work even better. It gives a nice > impression of density of data - note the "missing values" > plotted as well, which are the basis of a logistic fit for > the probability of getting a reading. > > sunflower gi lconc , binwidth(.25) addplot( lfit adjgi lconc > if adjgi>25) legend(ring(0) pos(9) col(1) order(4 "Linear" 1 > "Single observation" 2 3 )) xlab( 0(1)-7 ) ylab(20 "No > reading" 26.2 30(2)42 44.3 ,angle(0)) ti(Alternative > presentation of gi values and linear fit) > > Final question - how can one provide example graphs on the > list if attachments are forbidden? (uuencoded .gph?) > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Alternatives to boxplot+line***From:*"Allan Reese (Cefas)" <allan.reese@cefas.co.uk>

- Prev by Date:
**st: RE: interpreting R-squared when constant has been supressed** - Next by Date:
**st: about vs. verinst** - Previous by thread:
**st: Alternatives to boxplot+line** - Next by thread:
**st: Loop question** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |