Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: composite labels with -graph hbar-, -graph bar-, -graph dot-


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: composite labels with -graph hbar-, -graph bar-, -graph dot-
Date   Thu, 29 Aug 2013 16:41:12 +0100

This post grows out of questions asked by Mike Cross. Mike deserves all
the credit for an interesting problem. I've focused on the essence as I
see it, but if it is twisted beyond recognition, the debit is to be
attributed to me.

For the record, the threads started by Mike begin at

http://www.stata.com/statalist/archive/2013-08/msg01253.html

http://www.stata.com/statalist/archive/2013-08/msg01275.html

but I've tried to make this self-contained.

The essence of the problem is (nice) display of composite axis labels
with -graph hbar- (or -graph bar- or -graph dot-).  The problem extends
to include -twoway-, but the solutions do not.

A composite axis label here could arise from a combination of two (or
possibly three) variables. One context is hybrid graph-tables.

To make things concrete, consider the foreign cars from the auto dataset

. sysuse auto, clear
. keep if foreign

For a dataset like this, the identifier (here -make-) is almost always
something we want to show on something like a bar or dot chart. We might
also want to show numeric information on the axes.

One argument here is that way you get the best of both worlds: the
graphical part of the display shows the general pattern and the details
and you can look up the exact value too.

The conservative objection that graphs are graphs and tables are tables,
and ne'er the twain shall meet, is thus being firmly rebutted. For more
on this, if you wish, see Cox, N.J. 2008. Between tables and graphs.
Stata Journal 8: 269-289
http://www.stata-journal.com/article.html?article=gr0034

Maarten Buis pointed to an -axis()- function from -egenmore- (SSC) which
is a helper function to combine variables to create a variable that can
be used as a single graph axis (hence the name). However, that's only a
partial solution as in effect it creates a variable with value labels
such as "Maarten 42" and "Mike 567" or "42 Maarten" and "567 Mike" which
won't line up well in general. (The user-programmer being maligned here
for a partial solution is myself.)

There are better ways. First we look at a simple solution that often
works. Clone the variable you want to show:

. clonevar price2 = price

. graph hbar (asis) price, over(make) over(price2, gap(*0.5)) nofill

Taking that more slowly,

1. -make- is an identifier, with a distinct value for each
observation.

2. Combining two -over()- options instructs -graph- to show all the
cross-combinations of the variables named, but in this case and many
others several cross-combinations do not exist in the data, so the
-nofill- option is crucial to remove the gaps that would be created in
the graph. If you forget the -nofill- option, the graph may not be even
be readable. In this example, there are as many distinct values of
-price- as observations, so -graph- would be trying to show 22 * 22 =
484 bars, whereas only 22 bars are defined by the data.

3. Two -over()- options by default imply thin bars because of two sets
of gaps. That can be tuned to taste. -gap(*0.5)- is one choice.

4. Using -clonevar- rather than -generate- ensures that variable and
value labels in particular are carried over. That may not be needed, but
it does no harm.

5. Note that

. graph hbar (asis) price, over(make) over(price, gap(*0.5)) nofill

doesn't work, as the attempt to get -price- to play two roles is asking
too much, but a clone does the job.

However, this trick is sensitive to whether the values of the
quantitative variable (here -price-) are all distinct, with no ties.

Consider instead -mpg-, for which there are 13 distinct values for the
22 observations of foreign cars in this dataset.

. clonevar mpg2 = mpg

. graph hbar (asis) mpg, over(mpg2) over(make, gap(*0.5) sort(mpg)) nofill

will work, but

. graph hbar (asis) mpg, over(make) over(mpg2, gap(*0.5)) nofill

may not be what you want.

There is, however, a direct solution to ensure that tied values are
shown distinctly. Think first of the sort order you want for your graph.
Here we simply go for sorting on -mpg-:

. sort mpg

The easy first part is that the order of observations is now the order
you want for your graph axis:

. gen axis = _n

The values of -axis- come from the observation numbers and so are integers
1 up. The more challenging second part is that we want to see the values
of -mpg- on the graph, for which the solution is using value labels.

. labmask axis, values(mpg)

shows off a helper command -labmask-, which should be installed from the
Stata Journal archives. (-search labmask, sj- shows that it was
discussed in the 2008 paper mentioned earlier.) The perhaps whimsical
command name is intended to convey that a variable wears a "mask" which
is visible from outside.

. graph bar (asis) mpg, over(axis) over(make, gap(*0.5 sort(mpg)) nofill

. graph bar (asis) mpg, over(make) over(axis, gap(*0.5)) nofill

are both now in reach, and you can choose accordingly.

Here are all the commands gathered together for anyone who wishes to run
them as a miniature tutorial. I have added -name()- options after each
graph call so that the graphs may all be compared. -labmask- (SJ) must
be installed first.

sysuse auto, clear

keep if foreign

clonevar price2 = price

graph hbar (asis) price, over(make) over(price2, gap(*0.5)) nofill name(g1)

graph hbar (asis) price, over(make) over(price, gap(*0.5)) nofill ///
title(this doesn't work as hoped!) name(g2)

clonevar mpg2 = mpg

graph hbar (asis) mpg, over(mpg2) over(make, gap(*0.5) sort(mpg)) ///
nofill name(g3)

graph hbar (asis) mpg, over(make) over(mpg2, gap(*0.5)) nofill name(g4)

sort mpg

gen axis = _n

labmask axis, values(mpg)

graph hbar (asis) mpg, over(axis) over(make, gap(*0.5) sort(mpg)) ///
nofill name(g5)

graph hbar (asis) mpg, over(make) over(axis, gap(*0.5)) nofill ///
name(g6)

Nick
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index