Box plots are a popular tool used to visualize the distribution of a continuous variable for each group of a categorical variable. You can use Stata's graph box command to create simple box plots, or you can add options to make more sophisticated charts.
Let's begin by opening the nhanes2l dataset and using tabstat to view the minimum, maximum, 25th, 50th, and 75th percentiles of age for each category of hlthstat.
. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . tabstat age, statistics(min p25 p50 p75 max) by(hlthstat) Summary for variables: age Group variable: hlthstat (Health status)
hlthstat | Min p25 p50 p75 Max | |
Excellent | 20 26 36 52 74 | |
Very good | 20 28 40 61 74 | |
Good | 20 34 52 64 74 | |
Fair | 20 48 62 67 74 | |
Poor | 21 56 62 68 74 | |
Total | 20 31 49 63 74 |
Let's use graph box to create a simple box plot for age over the five categories of hlthstat.
. graph box age, over(hlthstat)
The center line in each box represents the 50th percentile (median) of age in its respective category of hlthstat. The bottom of each box represents the 25th percentile of age and the top of each box represents the 75th percentile. The interquartile range is the difference between the 75th and 25th quartiles. The bottom "whisker" below the box is called the "lower adjacent value", and it is equal to the 25th percentile minus 1.5 times the interquartile range. The upper "whisker" above the box is called the "upper adjacent value", and it is equal to the 75th percentile plus 1.5 times the interquartile range.
Next, let's add a title to our graph. Note that I'm using the “triple slash” to write my command across two lines. You can't do this in the Command window, but it is useful when writing long graph commands in do-files.
. graph box age, over(hlthstat) /// title("Box plot of age by health status")
We could rotate our graph to make a horizontal bar chart. This is a useful option when the categories have long names.
. graph hbox age, over(hlthstat) /// title("Box plot of age by health status")
We could also view our box plot over categories of diabetes and hlthstat.
. graph hbox age, over(diabetes) over(hlthstat) /// title("Box plot of age by diabetes and health status")
We could add the asyvars option to plot the boxes for people with and without diabetes using different colors.
. graph hbox age, over(diabetes) over(hlthstat) asyvars /// title("Box plot of age by diabetes and health status")
And we can use the legend() option to display the legend in one row below the title.
. graph hbox age, over(diabetes) over(hlthstat) asyvars /// title("Box plot of age by diabetes and health status") /// legend(rows(1) position(12))
There are many other options that you can use to customize your box plots, and you can read about them in the manual. You can also watch a demonstration of these commands by clicking on the link to the YouTube video below.
Watch Box plots in Stata.
Read more in the Stata Graphics Reference Manual; see [G] graph box.
Read more in the Stata Base Reference Manual; see [R] tabstat.