Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Alternatives to boxplot+line


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Alternatives to boxplot+line
Date   Fri, 28 Sep 2007 16:28:33 +0100

There are issues on various levels here. I'd want 
to separate out 

0. what works for Allan's data
1. Allan's personal graphical taste
2. collective habits or conventions
3. my program -stripplot- 

I can't comment on 0. 

On 3 there is no disagreement. -stripplot- just provides
a default that can be overridden by the user if it is 
not appropriate. There are options in the program 
to jitter (courtesy -twoway-), to stack tied data 
points and to bin data points before plotting. All this 
should be evident from reading the help. 

On 1 and 2 we can, in the cant phrase, agree to differ. 
I first encountered box plots in their incarnation
of dispersion diagrams as a student geographer about
40 years ago, before Tukey had even published 
on the subject, which makes me less inclined to
swallow whole all the details of what Tukey did with this 
particular graph form. 

It is rather puzzling to me that the 
convention that Tukey is quoted for, to show 
whiskers covering data points between the quartiles
and the most extreme points within 1.5 IQR of
the nearer quartile, is so often treated so reverently
(especially since he changed his mind on what 
was advisable at least once). 

Tukey suggested this as a rule for hand plotting 
when exploring small or moderate size datasets
and as a way of identifying possible outliers
that should be thought about individually. 
That this then automatically carries through 
to computer plotting of presentation graphics of 
what may be much larger datasets is not to me self-evident. 

Cleveland once briefly suggested plotting 
whiskers out to 10% and 90% percentiles. This 
has been echoed by various statistical software. It seems 
to me _easier_ to explain than the "within 
1.5 IQR" rule. 

No matter, what -stripplot- does with its -box-
option is just to superimpose a (transparent) box with median
and quartiles on a strip plot of all values. 
I can't see how that can be any more difficult 
to explain or to understand than other flavours
of the box plot, or that it is in any sense 
capricious. I didn't invent it, either. 

As it happens, I helped Tom Steichen on an 
early sunflower plot program as an interesting programming 
project. What Bill Dupont and Gale Plummer and then StataCorp 
did is much more impressive, but I have not yet really 
found sunflower plots more effective than alternatives. 
That can be put down to taste if you like, but I think 
that efficient mental decoding of sunflowers is problematic. 

With thousands of data points we are often in a different 
graphical league anyway. What I find myself doing increasingly is
to use a grey scale colour to get the data points as a 
backdrop and then looking for structure in some superimposed
smooth(s) of response w.r.t. predictor. 

As attachments are discouraged on the list, the only 
alternatives I can think of are (1) to post graphs on a website
(2) to send instructions for similar graphs using mutually 
accessible datasets. 

Recent users' group meetings heard that the graph editor
will be (shortly) more auditable (?word) than it 
currently is. Whatever, the graph editor in 10 is of course
not compulsory! 

Nick 
n.j.cox@durham.ac.uk 

Allan Reese
 
> Nick Cox, as ever, provides interesting questions and 
> constructive suggestions.
> 
> I'm loath to move to v10 and the graph editor, because that 
> route doesn't provide an audit trail or commands for production use.
> 
> The stripplot route would provide the nearest to the request, 
> with a bit more code to plot outliers.  Regardless of 
> personal preferences, boxplots have documented conventions 
> (which I personally like) and it doesn't aid communication to 
> arbitrarily ignore them - there are enough badly-drawn graphs 
> in publication without deliberate tinkering.  As a comment on 
> the plain stripplot, it's potentially misleading unless 
> points are jittered to show concurrencies.
> 
> What I forgot to mention was that my boxplot does have 
> categories as well as a metric.  The actual example show the 
> good linear fit over a range of concentrations, each 
> concentration tested in two sets of equipment.  I can fudge 
> that by moving each set to the side of the nominal 
> concentration, but I'm now clearly doing a lot of one-off 
> coding to get the raw values, as Nick says, for rbar.
> 
> Looking at the jittered stripplot+line, I thought a 
> sunflowerplot might work even better.  It gives a nice 
> impression of density of data - note the "missing values" 
> plotted as well, which are the basis of a logistic fit for 
> the probability of getting a reading.
> 
> sunflower gi lconc , binwidth(.25) addplot( lfit adjgi lconc 
> if adjgi>25) legend(ring(0) pos(9) col(1) order(4 "Linear" 1 
> "Single observation"  2 3 ))    xlab( 0(1)-7 ) ylab(20 "No 
> reading" 26.2 30(2)42 44.3 ,angle(0)) ti(Alternative 
> presentation of gi values and linear fit) 
> 
> Final question - how can one provide example graphs on the 
> list if attachments are forbidden?  (uuencoded .gph?)
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index