Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Advanced features for bar chart and histogram in Stata

From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Advanced features for bar chart and histogram in Stata
Date   Sun, 21 Aug 2011 13:53:31 +0100


My guess is that you would get closer to what you want by -reshape
long- followed by use of -over()- options.


The key here is that you can have different colours so long as you
have different variables. I have in preparation a Stata Tip on this
topic which I append below.

Highlighting specific bars

A frequent need when drawing a bar or dot chart is to highlight a subset
of observations while keeping the overall sort order. The stipulation of
keeping the overall sort order is what provides the challenge here, as
otherwise we could just add subdivision by another variable to the
command, as when distinguishing foreign cars among those with the best
repair record:

. sysuse auto, clear

. graph hbar (asis) mpg if rep78 == 5, over(make, sort(1) descending)

. graph hbar (asis) mpg if rep78 == 5, over(make, sort(1) descending)
over(foreign) nofill

Figure 1 shows graphs for these two commands. In Figure 1(a), the
ordering is within all the observations specified. In Figure 1(b), the
extra option -over(foreign)- subdivides observations according to the
further variable -foreign-. Note also the crucial detail of -nofill-.
This can be a useful kind of graph, but it is not what is
wanted here.

Let us suppose we have data on basin (catchment or watershed) areas for
various large rivers in the world, and we want to show where the
Mississippi comes in the rank order for the very largest. Some example
data from Allen (1997) are included with the media for this issue.

. use rivers

Figure 2 as a first graph shows that the Mississippi ranks third on area
of basin in this dataset, after the Amazon and Nile.

. graph hbar (asis) area if area >= 1000, over(name, sort(1) descending)

Highlighting a particular bar means giving it a different color. Some
acquaintance with the bar chart commands shows that they are willing to
combine bars for different variables, which will be assigned different
colors, so
the need is simply to put data for two subsets, the Mississippi and the
others, into two different variables. -separate- is a command designed
for precisely this purpose. For other graphical applications of
-separate-, see Cox (2005). It is naturally also possible to use
-generate- directly.

. separate area, by(name == "Mississippi")

In this example, the equality supplied to -by()- is either false or
true, numerically 0 or 1, and so -separate- creates two new
variables, -area0- and -area1-.

. graph hbar (asis) area0 area1 if area >= 1000, nofill ///
over(name, sort(area) descending) legend(off) ytitle("`: var label area'")

We are plotting bars for values that are non-missing on -area0- and
missing on -area1-, or vice versa. But -graph- plots no bars
when values are missing. This is easy to fix: -nofill- gets us the
intended effect. In this case, we suppressed the legend, imagining that,
depending on the purpose, we could add a title for a presentation, as

title(Mississippi ranks third in catchment area)

or underline the message of the graph in informative text supplied in a
text or word processor. As two response variables are being shown on the
same graph, we have to step in to provide an informative y axis title,
in this case by automating use of the variable label for -area-}.
Nothing stops us just providing a title explicitly, as when no such
variable label has been defined.

In principle, using -stack- should have the same effect as using
-nofill-. In practice, there can be small complications if there are
other missing values in the data, which are fixable with an appropriate
-if- exclusion.

The main problem now being solved, we could clearly heighten the
contrast, as by adding -bar(1, bfcolor(none))-. Figure 3 shows the
graph after that tweak.

Similar needs are met by variations on this theme.

In our example, the subset to be highlighted is a single observation,
but nothing depends on that being true.

Equally, three or more subsets could be distinguished.  For a more
elaborate subdivision we might want a legend, although there is a
trade-off: the more complicated and elaborate the design, for which a
legend becomes necessary, the less the impact of the graph is likely to

The examples all are based on showing values -asis-. If graphs of
this kind are needed, but for means or other summary statistics, it is
often easiest to -collapse- or -contract- the dataset first, and
then use -separate- and -graph hbar (asis)-.

The same device can be used with -graph bar-, -graph dot- or
various subcommands of -twoway- such as -twoway bar-. In
practice, when we want this, the individual observations include names
that are informative, so horizontal alignment makes those names more
readable. If -graph dot- were to be used, we should consider
heightening the contrast, as by adding -marker(2, msize(*3))-.

Allen, P.A. 1997.
Earth Surface Processes.
Oxford: Blackwell Science.

Cox, N.J. 2005.
Stata tip 27: Classifying data points on scatter plots.
Stata Journal 5(4): 604--606.

On Sun, Aug 21, 2011 at 12:22 PM, Fredrik Norström
<[email protected]> wrote:
> Dear Statalist users,
> I am struggling with generating histograms like I want them to be. To avoid a lot of manual extra work I would rather like to solve it so that Stata generates it for me. I am doubtful if that is possible but hope that at least someone can verify that for me if so.
> In my questionnaire I have asked about symptoms before and after disease diagnosis. I want to generate a graph that includes 3 different symptoms (heartburn, nausea and vomiting) before and after diagnosis and the proportion of users with major problem for them. I have figured out that I can use
> "graph bar (mean)  Heartburn_before Heartburn_after Nausea_before Nausea_after Vomiting_before Vomiting_after" to generate such a graph but that graph does not look like I want it to be.
> I want it to be:
> 1) First two bars side-by-side for heartburn (to left before diagnosis and right after diagnosis) then a gap and those two for nausea side-by-side, a gap and those two for vomiting side-by-side. I have tried to use bargap but then all bars will have a difference between each other which is not what I am interested in.
> 2) Below bars for heartburn I want to have label "Heartburn" and similarly for nausea and vomiting. No idea how to do this right now without manually doing it in paint or another simple graphical program where I risk losing picture quality as well as having to redo everything if changes are necessary.
> 3) In upper right at graph I would like to have the legend with the labels "Before diagnosis" and "After diagnosis" with the colour of each of these bars. For every symptom I will have same colour for before and after diagnosis. This issue I know how I easily could solve by editing graph in Stata but would be nice to learn the code for how to specify these offsets.
> I hope that I despite the lack of an illustration have managed to explain what I am interested in.
> The other graph I am preparing for my paper looks at relation between diagnosis for two diseases. The graph should illustrate what disease that causes the other one. Also this problem I have solved it with histogram option in Stata. However, I am interested in making the graph more advanced. I would like it to be light color if one disease occurs before the other and a dark color if other disease occurs first. Is it possible to have a legend for a histogram where colors are different for different values?

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index