st: Axis rules made to be broken

Thu, 15 May 2003 11:49:00 +0100 (BST)

Apart from the joy of writing that subject line, I feel able to comment because I have looked at the literature, come to a wider conclusion, and for some years have taught a module in data graphics. Let me also quote Constantine Daskalakis ([CD] statalist 14 May): "First, I don't want to plot on the log scale. Why would I? Second, I don't want to waste three quarters of my graph area by using the full scale of the Y axis (i.e., 0-100%), when all my measurements are around 85-95%." I'm not making any personal criticism, but this is the type of language that implies graphic design is about style, whim or individual preference. If graphics are to be genuine tools of communication, we need to adopt and understand a common language. I reserve my most scathing comments for the software adverts that boast "use our program and with a few clicks within minutes you will produce impressive professional graphs." Imagine anyone making such a claim for a word processor! Axes are misunderstood, I think, because we usually encounter them in school as the skeleton on which data values are then measured and plotted. When software is doing the plotting, this function disappears and the axis must become an informative part of the plot - or why is it needed at all? My second point is even more obvious: a graph is used to portray either a magnitude or a relationship. A bar graph representing 500 units must have a bar whose length is proportional to 500, or what is the point? A moment's thought should convince you that a bar graph actually conveys information only if there is more than one bar, so that the relative lengths convey the relative sizes of the data. Anyone who doubts this can find a counter-example each Saturday in the (London) Daily Telegraph Review supplement (I can't find this on their web site). It is a small feature that, I assume, is a post-modern spoof: "How tall is ...?" each week gives the height of a film character accompanied by a picture drawn against a scale - a pictogram of one datum. My third point is an explanation of the subject line: if graphics are to be a fertile and useful means of communication, there must be underlying rules BUT designers will bend or break the rules for creative effect. This is what we do with words. I too have seen books that state explicitly "any axis must include zero" (eg, Schmid, 1983 Statistical Graphics), but then distort the meaning by allowing broken axes with a zig-zag. Rules learned by rote are misleading. I require students to identify their design decisions and give reasons for their choices. If a choice has been reasoned and justified, then it is "correct", even if I might have made a different choice. An axis should contain zero *if the statement being made is one of absolute magnitude.* I have seen bar charts in The Economist (who should know better) in which the axis, and hence the bars, start at some arbitrary non-zero value. If they convey any information, the bars convey a lie. The artist drew such a graph (and the editor accepted it) because bars drawn in true ratio would have not shown any visible variation. The intended message was to show year on year change - so the choice of a bar chart was inappropriate and no tinkering with the axis range could correct this. I wonder also what readers gain in practice from bar charts where one category absolutely dominates, and a convention is to draw all other bars to scale but put a break (and broken axis) for the largest. The visual message has, at best, been diluted. A graph that focuses on *change* need not include the origin of the absolute figures, because change implies "change from what?" and it is this reference point that becomes the de facto centre of attention. If bars are drawn, they should be from the reference point, but a line-plot may be more effective in emphasizing the direction of change. The choice of the axis range can be determined from the range in the data, pragmatically, or by an algorithm quoted by Cleveland "Banking to 45 degrees". What is helpful (essential?) is for the designer to have a clear articulation of what they are trying to describe: "this x goes up", "this x goes up at an accelerating rate", "this x is going up faster than that y" etc. (Read Tufte for discussion of which values to label along the axis.) CD gives an example of plotting percentages which is particularly informative. In many situations the interest lies not on x but on 100-x. "Use of PCs has gone up from 70 to 80%" is equivalent to "one third of those who did NOT use PCs now do so". To relate this to the original question, I have illustrated above that a broken axis is, at best, a compromise introduced in the final stages of presentation, that will never make a graph more informative. Stata has, up to version 7, concentrated on graphics as an analytical tool. It exchewed the "gee whizz" emphasis on style characteristic of "presentation graphics software". By comparison, Stata output therefore lacked the impact for audiences more impressed by style than substance. Version 8 offers various instant styles that remove the "sackcloth and ashes" asceticism. Adding a variety of gimmicks (and controls) that invite distortion rather than clarification seems to me a dangerous step. On the other hand, I still regret the failure to maintain Stage (the Stata Graphics Editor) which was just such a tool for making user-chosen changes to a basic graph. R. 