Home  /  Resources & Support  /  Introduction to Stata basics  /  margins and marginsplot for a categorical predictor variable

Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, age, bmi, diabetes, and hlthstat.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe bpsystol hlthstat diabetes age bmi

Variable Storage Display Value name type format label Variable label
bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) bmi float %9.0g Body mass index (BMI)
. summarize bpsystol hlthstat diabetes age bmi
Variable Obs Mean Std. dev. Min Max
bpsystol 10,351 130.8817 23.33265 65 300
hlthstat 10,335 2.586164 1.206196 1 5
diabetes 10,349 .0482172 .2142353 0 1
age 10,351 47.57965 17.21483 20 74
bmi 10,351 25.5376 4.914969 12.3856 61.1297

We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m2.

The description tells us that the value label hlth is attached to the variable hlthstat. Let's type label list hlth to view the categories of hlthstat.

. label list hlth

hlth:
           1 Excellent
           2 Very good
           3 Good
           4 Fair
           5 Poor
          .a Blank but applicable

The variable hlthstat has five categories numbered 1 through 5 and labeled “Excellent”, “Very good”, “Good”, “Fair”, and “Poor”, respectively. Category “.a” is a missing value that will be omitted from the regression model.

Let's fit a linear regression model using the continuous outcome variable bpsystol and the categorical predictor variable hlthstat. Note that I have used factor-variable notation to tell Stata that hlthstat is a categorical predictor.

. regress bpsystol i.hlthstat

Source SS df MS Number of obs = 10,335
F(4, 10330) = 158.34
Model 325244.686 4 81311.1715 Prob > F = 0.0000
Residual 5304728.67 10,330 513.526492 R-squared = 0.0578
Adj R-squared = 0.0574
Total 5629973.35 10,334 544.800982 Root MSE = 22.661
bpsystol Coefficient Std. err. t P>|t| [95% conf. interval]
hlthstat
Very good 2.981587 .6415165 4.65 0.000 1.72409 4.239083
Good 8.034913 .6230047 12.90 0.000 6.813703 9.256123
Fair 14.71925 .721698 20.40 0.000 13.30459 16.13392
Poor 16.42304 .9580047 17.14 0.000 14.54517 18.30092
_cons 124.3191 .4618951 269.15 0.000 123.4137 125.2245

The output does not include the “Excellent” hlthstat category because Stata uses the category with the smallest number as the referent category. So the coefficient labeled “_cons” is the expected SBP for the “Excellent” category of hlthstat.

The remaining coefficients are the differences between the expected SBP in the “Excellent” group and the other groups. For example, the expected SBP in the “Poor” group is 16.42304 mmHg higher than the “Excellent” group.

We could estimate the expected SBP in the “Poor” group by adding the coefficient for “_cons” and the coefficient for “Poor”.

. display "E(SBP | hlthstat=Poor) = "  124.3191 + 16.42304
E(SBP | hlthstat=Poor) = 140.74214

We could do the same calculation for the other groups:

. display "E(SBP | hlthstat=Very good) = "  124.3191 + 2.981587
E(SBP | hlthstat=Very good) = 127.30069

. display "E(SBP | hlthstat=Good) = "  124.3191 + 8.034913
E(SBP | hlthstat=Good) = 132.35401

. display "E(SBP | hlthstat=Fair) = "  124.3191 + 14.71925
E(SBP | hlthstat=Fair) = 139.03835

Stata's margins command will estimate the expected SBP for each group. Note that the “i.” prefix is required in the regress command but not in the margins command.

. margins hlthstat

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()

Delta-method
Margin std. err. t P>|t| [95% conf. interval]
hlthstat
Excellent 124.3191 .4618951 269.15 0.000 123.4137 125.2245
Very good 127.3007 .4451924 285.95 0.000 126.428 128.1733
Good 132.354 .4180763 316.58 0.000 131.5345 133.1735
Fair 139.0383 .5545276 250.73 0.000 137.9513 140.1253
Poor 140.7421 .8393008 167.69 0.000 139.0969 142.3873

The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.

We can plot the marginal predictions and their 95% confidence intervals by typing marginsplot.

. marginsplot

Variables that uniquely identify margins: hlthstat

By default, marginsplot creates a profile plot using lines. We can use the recast(bar) option if we prefer a bar chart, or “dynamite plunger plot”.

. marginsplot, recast(bar)

Variables that uniquely identify margins: hlthstat

We can add the horizontal option to create a horizontal bar chart.

. marginsplot, recast(bar) horizontal

Variables that uniquely identify margins: hlthstat

Let's add some additional options to make our graph look nicer. We can use the plotopts(barwidth(0.8)) option to add some space between the bars. And we can use the title(), subtitle(), xtitle(), and ytitle() options to add various titles to our graph.

. marginsplot, recast(bar) horizontal plotopts(barwidth(0.8)) 
               title("Expected systolic blood pressure (mmHg)") 
               subtitle("By health status") 
               xtitle("Expected systolic blood pressure (mmHg)")

Variables that uniquely identify margins: hlthstat

You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.