[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Maarten Buis" <M.Buis@fsw.vu.nl> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: RE: st: wrapping title with by option |

Date |
Fri, 20 Jul 2007 15:38:11 +0200 |

Thanks Nick, It's a pity, though I agree with most of the points you made (I think all of them, but there are so many I might have missed one I disagree with). Maarten ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room Z434 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of n j cox Sent: vrijdag 20 juli 2007 15:10 To: statalist@hsphsun2.harvard.edu Subject: Re: RE: st: wrapping title with by option My reading of this is to agree that you can't do what Maarten wants using the -by()- option. This is, from one point of view, a limitation of -by()-. However, from most other points of view, sacrificing large amounts of precious space to separate graph headings is poor, or at least unfortunate, design, and you shouldn't want to do it. Looking at the graph in question, produced using R, I wouldn't try to replicate it, as I think it needs a new design. I haven't read the paper, so I am just focusing on this graph in isolation. What Maarten is trying to mimic are two titles: Percentage of Graphs Percentage of and Tables Combined, Graphs Within by Category Each Category I can pick various nits: 1. Too much use of upper case. Upper case is needed for proper names, but there are none here. It takes up more space than lower case with nice fonts. In any case, Too Many Capitals amount to Shouting. 2. Repetition of "Percentage". 3. "percent" [sic] would be fine to indicate units, but it should be moved to the bottom of the graph, where there is space to spare, and used just once. Then the "%" arbitrarily added to two axis labels (Stata terminology) can be removed. 4. "by Category" and "Within Each Category" look superfluous. It's a fine goal that graphs be self-explanatory, but these words don't add anything to my understanding of the graph. That leaves two titles, "graphs and tables" and "graphs", which should be unproblematic in Stata. I've started, so I'll finish: 5. In my view, ticks imply position on a numerical scale. The vertical axis is categorical and the ticks just make the graph busier. Cut! 6. As with #1, the mix of upper and lower case on the vertical axis is distracting and unnecessary. 7. The label "Summary Stats" is slangy and inappropriate in any (international) professional journal. To my ears, it is divisive. Slang that some people use is not preferable to proper professional language. "summary statistics" would be better. (There's enough space given other labels.) 8. One value, 100%, is shown as a point symbol that lies on the vertical axis. I prefer the convention of a small offset so that all data points lie within the plot region. 9. Most importantly, I am not clear that juxtaposed panels with different scales are the best way of allowing comparisons here. Presumably, the authors want us to compare two sets of numbers, but their format does not make that easy or effective. R is a wonderful language, and it can produce superb graphics, including many things that Stata cannot (yet) do easily. (The reverse will also be true, but I don't know enough about R to be able to say what one can do easily in Stata but one couldn't do (easily) in R.) So, I don't want to knock R. That said, the amount of R code used for this example by the authors is dismaying. But if I can decode one comment on the Gelman blog correctly, theirs is not a very good example of R use. I just glanced at the next figure, which is a mosaic plot. Mosaic plots are a very ingenious idea, but the key issue is, as always, Do they work? When they are easy to decode there is an even easier alternative form and when they are difficult to decode they are not much use, except that you are regarded as awkward or negative if you point that out. The root idea is encoding categorical frequencies by _areas_, but decoding areas is inefficient, as Bill Cleveland showed clearly twenty and more years ago. Mosaic plot users seem to realise this, as they typically colour-code different kinds of areas to try to draw attention to what you should be noticing. Colour encoding can be even less efficient than area encoding for showing _quantitative_ contrasts unless handled very carefully. It may well be that I have yet to see the point, but I find most complicated mosaic plots no more transparent than the original tables. In the authors' Figure 2, mosaic plots are used for showing two 2 x 2 tables. No-one knowing anything about my work could accuse me of being against graphics, but I do suggest that such tables usually don't need much graphical back-up. Nevertheless, simple plots such as those produced by my -tabplot- and -tableplot- (downloadable from SSC) are an easier alternative to mosaic plots here. In each the idea is that of a tabular array of bars, so that categorical frequencies are encoded by bar _heights_. Admittedly, graphics for categorical data remains a problematic area. As with multivariate graphics, there are lots of ideas, each with enthusiastic proponents convinced that theirs is the true path to follow, but each failing to convince many others. Some of the ideas in Cleveland's books remain under-used. Cleveland, W.S. 1994. The elements of graphing data. [read first] 1993. Visualizing data. both from Hobart Press, Summit, NJ (which, like Edward Tufte's operation, appears to exist only to publish the author's books). Maarten Buis --- Austin Nichols wrote: > One option to add a second line is to use -subtitle("extra line", > suffix)- but this is clearly not a general solution, since it adds the > same second line to each graph. It seems that the -by()- option > inevitably does not give one sufficient flexibility--but that option > just automates the construction of multiple graphs that could also be > produced separately and combined, so one general solution is to just > do it manually. Note that -levelsof- and -foreach- are overkill here, > but easier to extend to cases where there are more than two by-groups. Austin, thanks for your reply. The reason I am trying to avoid -graph combine- is that almost never looks nice whenever the axis labels/titles aren't equally wide. In this case I am trying to reproduce this graph: http://tables2graphs.com/doku.php?id=03_descriptive_statistics#figure_1 , so no y-labels in the second graph. You can tweak it by using the -fxsize()- option, but is quite fragile (you'll have to re-tweak the graph whenever you change the y-labels or whenever you use a different font). This is undesirable since this is intended as a code example that others might be able to use on their own data. The -by()- option automatically takes care of this problem, as can be seen in the example below. *--------- begin example -------- sysuse auto, clear scatter pri mpg if for==0, /* */ name(dom, replace) scatter pri mpg if for==1, /* */ name(for, replace) /* */ ylab(none) ytitle("") graph combine dom for, /* */ ycommon xcommon /* */ name("combined", replace) scatter pri mpg, by(for) *------- end example ------------ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: running sum restarting after missing value** - Next by Date:
**Re: st: wrapping title with by option** - Previous by thread:
**Re: st: wrapping title with by option** - Next by thread:
**st: running sum restarting after missing value** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |