[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Alternatives to boxplot+line
"Nick Cox" <email@example.com>
st: RE: Alternatives to boxplot+line
Fri, 28 Sep 2007 16:28:33 +0100
There are issues on various levels here. I'd want
to separate out
0. what works for Allan's data
1. Allan's personal graphical taste
2. collective habits or conventions
3. my program -stripplot-
I can't comment on 0.
On 3 there is no disagreement. -stripplot- just provides
a default that can be overridden by the user if it is
not appropriate. There are options in the program
to jitter (courtesy -twoway-), to stack tied data
points and to bin data points before plotting. All this
should be evident from reading the help.
On 1 and 2 we can, in the cant phrase, agree to differ.
I first encountered box plots in their incarnation
of dispersion diagrams as a student geographer about
40 years ago, before Tukey had even published
on the subject, which makes me less inclined to
swallow whole all the details of what Tukey did with this
particular graph form.
It is rather puzzling to me that the
convention that Tukey is quoted for, to show
whiskers covering data points between the quartiles
and the most extreme points within 1.5 IQR of
the nearer quartile, is so often treated so reverently
(especially since he changed his mind on what
was advisable at least once).
Tukey suggested this as a rule for hand plotting
when exploring small or moderate size datasets
and as a way of identifying possible outliers
that should be thought about individually.
That this then automatically carries through
to computer plotting of presentation graphics of
what may be much larger datasets is not to me self-evident.
Cleveland once briefly suggested plotting
whiskers out to 10% and 90% percentiles. This
has been echoed by various statistical software. It seems
to me _easier_ to explain than the "within
1.5 IQR" rule.
No matter, what -stripplot- does with its -box-
option is just to superimpose a (transparent) box with median
and quartiles on a strip plot of all values.
I can't see how that can be any more difficult
to explain or to understand than other flavours
of the box plot, or that it is in any sense
capricious. I didn't invent it, either.
As it happens, I helped Tom Steichen on an
early sunflower plot program as an interesting programming
project. What Bill Dupont and Gale Plummer and then StataCorp
did is much more impressive, but I have not yet really
found sunflower plots more effective than alternatives.
That can be put down to taste if you like, but I think
that efficient mental decoding of sunflowers is problematic.
With thousands of data points we are often in a different
graphical league anyway. What I find myself doing increasingly is
to use a grey scale colour to get the data points as a
backdrop and then looking for structure in some superimposed
smooth(s) of response w.r.t. predictor.
As attachments are discouraged on the list, the only
alternatives I can think of are (1) to post graphs on a website
(2) to send instructions for similar graphs using mutually
Recent users' group meetings heard that the graph editor
will be (shortly) more auditable (?word) than it
currently is. Whatever, the graph editor in 10 is of course
> Nick Cox, as ever, provides interesting questions and
> constructive suggestions.
> I'm loath to move to v10 and the graph editor, because that
> route doesn't provide an audit trail or commands for production use.
> The stripplot route would provide the nearest to the request,
> with a bit more code to plot outliers. Regardless of
> personal preferences, boxplots have documented conventions
> (which I personally like) and it doesn't aid communication to
> arbitrarily ignore them - there are enough badly-drawn graphs
> in publication without deliberate tinkering. As a comment on
> the plain stripplot, it's potentially misleading unless
> points are jittered to show concurrencies.
> What I forgot to mention was that my boxplot does have
> categories as well as a metric. The actual example show the
> good linear fit over a range of concentrations, each
> concentration tested in two sets of equipment. I can fudge
> that by moving each set to the side of the nominal
> concentration, but I'm now clearly doing a lot of one-off
> coding to get the raw values, as Nick says, for rbar.
> Looking at the jittered stripplot+line, I thought a
> sunflowerplot might work even better. It gives a nice
> impression of density of data - note the "missing values"
> plotted as well, which are the basis of a logistic fit for
> the probability of getting a reading.
> sunflower gi lconc , binwidth(.25) addplot( lfit adjgi lconc
> if adjgi>25) legend(ring(0) pos(9) col(1) order(4 "Linear" 1
> "Single observation" 2 3 )) xlab( 0(1)-7 ) ylab(20 "No
> reading" 26.2 30(2)42 44.3 ,angle(0)) ti(Alternative
> presentation of gi values and linear fit)
> Final question - how can one provide example graphs on the
> list if attachments are forbidden? (uuencoded .gph?)
* For searches and help try: