Home  /  Products  /  Stata 18  /  Graph colors by variable

<- See Stata 18's new features

Highlights

  • Use marker colors to convey variable information

  • Vary colors continuously or discretely

  • Specify how colors should be linked to values of the color variable

  • Works with many two-way plots, including scatterplots and bar charts

Want colors of the points in your scatterplot to reflect age groups? Or want the color of bars in your bar graph to reflect income level? Or want the colors of dots in your dot plot to reflect health status?

In Stata 18, the new colorvar() option allows many twoway plots to vary the color of markers, bars, and more based on the values of a variable.

Let's see it work

To draw a scatterplot of variables mpg and price, type

. sysuse auto, clear
(1978 automobile data)

. twoway scatter mpg price, colorvar(weight)

The color of the markers is determined by the value of the variable specified in option colorvar(). The weight variable is partitioned into four levels: <=2000, (2000, 3000], (3000, 4000], and (4000, 5000]. The markers are colored based on the level to which they belong.

To draw a bar plot of variables change and date, type

. sysuse sp500, clear
(S&P 500)

. twoway bar change date, colorvar(volume) colorcuts(5000(10000)25000)

The color of the bar is determined by the value of the variable, volume. The volume variable is partitioned to three levels determined by the option colorcuts(): <=5000, (5000, 15000], and (15000, 25000].

We can specify the colordiscrete option to treat the variable specified in option colorvar() as discrete. To draw a scatterplot of the variables mpg and price, type

. sysuse auto, clear
(1978 automobile data)

. gen weight2 = int(weight / 1000) * 1000

. twoway scatter mpg price, colorvar(weight2) colordiscrete

The color of the markers is determined by the value of the variable, weight2. There are four levels: 1000, 2000, 3000, and 4000. Note that, for colordiscrete, the level is a point instead of an interval.

In the three examples above, the legend is a clegend (the type of legend used for contour plots) that corresponds to a z axis; this legend is suited for numerical variables. The combination of options coloruseplegend and colordiscrete is useful to display categorical variables. To draw a scatterplot of the variables mpg and price, type

. sysuse auto, clear
(1978 automobile data)

. twoway scatter mpg price, colorvar(foreign) colordiscrete       
             colorrule(phue) zlabel(, valuelabel) coloruseplegend   
             plegend(order(2 1))

The color of the markers is determined by the value of the variable, foreign. Because the colordiscrete option is used, the colors correspond to the two levels of foreign, 0 and 1. The markers are colored using the colors of p1 and p2, the first and second colors used by the graph scheme, because option colorrule(phue) is specified. The legend is a plegend (the type of legend used for contour-line plots) instead of a clegend because option coloruseplegend is specified. The plegend keys are labeled using the value label of variable foreign because option zlabel(,valuelabel) is specified. The legend keys are reordered to show “Domestic” first because option plegend(order(2 1)) is specified.

Tell me more

Read more in the Stata Graphics Reference Manual; see [G] colorvar_options.

View all the new features in Stata 18.

Made for data science.

Get started today.