Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.
Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, age, bmi, diabetes, and hlthstat.
. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe bpsystol hlthstat diabetes age bmi
Variable Storage Display Value name type format label Variable label |
bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) bmi float %9.0g Body mass index (BMI) |
Variable | Obs Mean Std. dev. Min Max | |
bpsystol | 10,351 130.8817 23.33265 65 300 | |
hlthstat | 10,335 2.586164 1.206196 1 5 | |
diabetes | 10,349 .0482172 .2142353 0 1 | |
age | 10,351 47.57965 17.21483 20 74 | |
bmi | 10,351 25.5376 4.914969 12.3856 61.1297 |
We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m2.
The description tells us that the value label hlth is attached to the variable hlthstat. Let's type label list hlth to view the categories of hlthstat.
. label list hlth hlth: 1 Excellent 2 Very good 3 Good 4 Fair 5 Poor .a Blank but applicable
The variable hlthstat has five categories numbered 1 through 5 and labeled “Excellent”, “Very good”, “Good”, “Fair”, and “Poor”, respectively. Category “.a” is a missing value that will be omitted from the regression model.
Let's fit a linear regression model using the continuous outcome variable bpsystol and the categorical predictor variable hlthstat. Note that I have used factor-variable notation to tell Stata that hlthstat is a categorical predictor.
. regress bpsystol i.hlthstat
Source | SS df MS | Number of obs = 10,335 | F(4, 10330) = 158.34 |
Model | 325244.686 4 81311.1715 | Prob > F = 0.0000 | |
Residual | 5304728.67 10,330 513.526492 | R-squared = 0.0578 | Adj R-squared = 0.0574 |
Total | 5629973.35 10,334 544.800982 | Root MSE = 22.661 |
bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |
hlthstat | ||
Very good | 2.981587 .6415165 4.65 0.000 1.72409 4.239083 | |
Good | 8.034913 .6230047 12.90 0.000 6.813703 9.256123 | |
Fair | 14.71925 .721698 20.40 0.000 13.30459 16.13392 | |
Poor | 16.42304 .9580047 17.14 0.000 14.54517 18.30092 | |
_cons | 124.3191 .4618951 269.15 0.000 123.4137 125.2245 | |
The output does not include the “Excellent” hlthstat category because Stata uses the category with the smallest number as the referent category. So the coefficient labeled “_cons” is the expected SBP for the “Excellent” category of hlthstat.
The remaining coefficients are the differences between the expected SBP in the “Excellent” group and the other groups. For example, the expected SBP in the “Poor” group is 16.42304 mmHg higher than the “Excellent” group.
We could estimate the expected SBP in the “Poor” group by adding the coefficient for “_cons” and the coefficient for “Poor”.
. display "E(SBP | hlthstat=Poor) = " 124.3191 + 16.42304 E(SBP | hlthstat=Poor) = 140.74214
We could do the same calculation for the other groups:
. display "E(SBP | hlthstat=Very good) = " 124.3191 + 2.981587 E(SBP | hlthstat=Very good) = 127.30069 . display "E(SBP | hlthstat=Good) = " 124.3191 + 8.034913 E(SBP | hlthstat=Good) = 132.35401 . display "E(SBP | hlthstat=Fair) = " 124.3191 + 14.71925 E(SBP | hlthstat=Fair) = 139.03835
Stata's margins command will estimate the expected SBP for each group. Note that the “i.” prefix is required in the regress command but not in the margins command.
. margins hlthstat Adjusted predictions Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()
Delta-method | ||
Margin std. err. t P>|t| [95% conf. interval] | ||
hlthstat | ||
Excellent | 124.3191 .4618951 269.15 0.000 123.4137 125.2245 | |
Very good | 127.3007 .4451924 285.95 0.000 126.428 128.1733 | |
Good | 132.354 .4180763 316.58 0.000 131.5345 133.1735 | |
Fair | 139.0383 .5545276 250.73 0.000 137.9513 140.1253 | |
Poor | 140.7421 .8393008 167.69 0.000 139.0969 142.3873 | |
The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.
We can plot the marginal predictions and their 95% confidence intervals by typing marginsplot.
. marginsplot Variables that uniquely identify margins: hlthstat
By default, marginsplot creates a profile plot using lines. We can use the recast(bar) option if we prefer a bar chart, or “dynamite plunger plot”.
. marginsplot, recast(bar) Variables that uniquely identify margins: hlthstat
We can add the horizontal option to create a horizontal bar chart.
. marginsplot, recast(bar) horizontal Variables that uniquely identify margins: hlthstat
Let's add some additional options to make our graph look nicer. We can use the plotopts(barwidth(0.8)) option to add some space between the bars. And we can use the title(), subtitle(), xtitle(), and ytitle() options to add various titles to our graph.
. marginsplot, recast(bar) horizontal plotopts(barwidth(0.8)) title("Expected systolic blood pressure (mmHg)") subtitle("By health status") xtitle("Expected systolic blood pressure (mmHg)") Variables that uniquely identify margins: hlthstat
You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.
Read more in the Stata Base Reference Manual; see [R] margins, [R] marginsplot, and [R] regress. And in the Stata User’s Guide, see [U-11] factor variables.