Home  /  Resources & Support  /  Introduction to Stata basics  /  margins and marginsplot for the interaction of two categorical predictor variables

Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, hlthstat, diabetes, age, and bmi.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe bpsystol hlthstat diabetes age bmi

Variable Storage Display Value name type format label Variable label
bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) bmi float %9.0g Body mass index (BMI)
. summarize bpsystol hlthstat diabetes age bmi
Variable Obs Mean Std. dev. Min Max
bpsystol 10,351 130.8817 23.33265 65 300
hlthstat 10,335 2.586164 1.206196 1 5
diabetes 10,349 .0482172 .2142353 0 1
age 10,351 47.57965 17.21483 20 74
bmi 10,351 25.5376 4.914969 12.3856 61.1297

We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m2.

Let's fit a linear regression model using the continuous outcome variable bpsystol, the binary predictor variable diabetes, and the categorical predictor variable hlthstat. Note that I have used factor-variable notation to tell Stata that diabetes and hlthstat are categorical predictors, and I have used the “##” operator to request the main effects and interaction of both predictor variables.

. regress bpsystol i.hlthstat##i.diabetes

Source SS df MS Number of obs = 10,335
F(9, 10325) = 86.92
Model 396524.045 9 44058.2272 Prob > F = 0.0000
Residual 5233449.31 10,325 506.871604 R-squared = 0.0704
Adj R-squared = 0.0696
Total 5629973.35 10,334 544.800982 Root MSE = 22.514
bpsystol Coefficient Std. err. t P>|t| [95% conf. interval]
hlthstat
Very good 2.636051 .6417076 4.11 0.000 1.37818 3.893922
Good 7.648725 .6272209 12.19 0.000 6.419251 8.8782
Fair 13.50647 .7408272 18.23 0.000 12.0543 14.95863
Poor 14.77223 1.032484 14.31 0.000 12.74837 16.7961
diabetes
Diabetic 5.780232 4.618696 1.25 0.211 -3.273308 14.83377
hlthstat#
diabetes
Very good #
Diabetic 17.43339 5.726714 3.04 0.002 6.207924 28.65886
Good #
Diabetic 4.023894 5.032308 0.80 0.424 -5.840404 13.88819
Fair #
Diabetic 7.316062 4.97969 1.47 0.142 -2.445096 17.07722
Poor #
Diabetic 3.445358 5.09316 0.68 0.499 -6.538222 13.42894
_cons 124.2614 .4611975 269.43 0.000 123.3574 125.1655

The output can be challenging to interpret because we have two predictors and an interaction, which produces many estimated coefficients. We could spend our time carefully interpreting each coefficient, or we could calculate the expected SBP for every combination of diabetes status and health status. But Stata's margins command will estimate the expected SBP for combinations of the two predictor variables or for one predictor “adjusted for” the other. Note that the “i.” prefix is required in the regress command but not in the margins command.

Let's estimate marginal predictions of SBP for each combination of the categories of hlthstat and diabetes.

. margins diabetes#hlthstat

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()

Delta-method
Margin std. err. t P>|t| [95% conf. interval]
diabetes#
hlthstat
Not diabetic #
Excellent 124.2614 .4611975 269.43 0.000 123.3574 125.1655
Not diabetic #
Very good 126.8975 .4461899 284.40 0.000 126.0229 127.7721
Not diabetic #
Good 131.9102 .4250916 310.31 0.000 131.0769 132.7434
Not diabetic #
Fair 137.7679 .5797601 237.63 0.000 136.6315 138.9043
Not diabetic #
Poor 139.0337 .9237528 150.51 0.000 137.2229 140.8444
Diabetic #
Excellent 130.0417 4.595612 28.30 0.000 121.0334 139.05
Diabetic #
Very good 150.1111 3.356161 44.73 0.000 143.5324 156.6898
Diabetic #
Good 141.7143 1.952195 72.59 0.000 137.8876 145.541
Diabetic #
Fair 150.8642 1.768852 85.29 0.000 147.3969 154.3315
Diabetic #
Poor 148.2593 1.93768 76.51 0.000 144.461 152.0575

The numbers reported in the “Margin” column are average values of the linear prediction of SBP for each combination of the categories of hlthstat and diabetes. For example, the output tells us that the expected SBP is 124.2614 for a person in “Excellent” health without diabetes and the expected SBP is 148.2593 for a person in “Poor” health with diabetes.

The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.

We can plot the marginal predictions and their 95% confidence intervals by typing marginsplot.

. marginsplot

Variables that uniquely identify margins: diabetes hlthstat

The resulting profile plot shows a separate line for each category of hlthstat. We could change the plot by changing the order of the variables in the margins command.

. margins hlthstat#diabetes

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()

Delta-method
Margin std. err. t P>|t| [95% conf. interval]
hlthstat
diabetes
Excellent #
Not diabetic 124.2614 .4611975 269.43 0.000 123.3574 125.1655
Excellent #
Diabetic 130.0417 4.595612 28.30 0.000 121.0334 139.05
Very good #
Not diabetic 126.8975 .4461899 284.40 0.000 126.0229 127.7721
Very good #
Diabetic 150.1111 3.356161 44.73 0.000 143.5324 156.6898
Good #
Not diabetic 131.9102 .4250916 310.31 0.000 131.0769 132.7434
Good #
Diabetic 141.7143 1.952195 72.59 0.000 137.8876 145.541
Fair #
Not diabetic 137.7679 .5797601 237.63 0.000 136.6315 138.9043
Fair #
Diabetic 150.8642 1.768852 85.29 0.000 147.3969 154.3315
Poor #
Not diabetic 139.0337 .9237528 150.51 0.000 137.2229 140.8444
Poor #
Diabetic 148.2593 1.93768 76.51 0.000 144.461 152.0575
. marginsplot Variables that uniquely identify margins: hlthstat diabetes

By default, marginsplot creates a profile plot using lines. We can use the recast(bar) option if we prefer a bar chart, or “dynamite plunger plot”. The option xdimension() defines the display of the x axis.

. marginsplot, recast(bar) xdimension(hlthstat diabetes)

Variables that uniquely identify margins: hlthstat diabetes

The overlap of the horizontal axis labels make them impossible to read. We can make them legible by adding the horizontal option to create a horizontal bar chart.

. marginsplot, recast(bar) horizontal xdimension(hlthstat diabetes)

Let's add more options to make our graph look nicer. We can use the plotopts(barwidth)) option to add some space between the bars. And we can use the title(), subtitle(), xtitle(), and ytitle() options to add various titles to our graph.

. marginsplot, recast(bar) xdimension(hlthstat diabetes)
               horizontal plotopts(barwidth(0.8)) 
               ytitle("") 
               xtitle("Expected systolic blood pressure") 
               title("Expected systolic blood pressure (mmHg)") 
               subtitle("By health status and diabetes status")

Variables that uniquely identify margins: hlthstat diabetes

Marginal effects

We can also use margins to estimate marginal predictions for one variable averaged over other variables in the model. For example, we can estimate the expected SBP for categories of diabetes averaged over hlthstat.

. margins diabetes

Predictive margins                                      Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()

Delta-method
Margin std. err. t P>|t| [95% conf. interval]
diabetes
Not diabetic 130.3211 .2273218 573.29 0.000 129.8755 130.7667
Diabetic 143.041 1.50395 95.11 0.000 140.093 145.9891

Or we could estimate the expected SBP for categories of hlthstat averaged over diabetes.

. margins hlthstat

Predictive margins                                      Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()

Delta-method
Margin std. err. t P>|t| [95% conf. interval]
hlthstat
Excellent 124.5405 .4918267 253.22 0.000 123.5764 125.5046
Very good 128.0183 .4545142 281.66 0.000 127.1274 128.9092
Good 132.3835 .4154021 318.69 0.000 131.5693 133.1978
Fair 138.4002 .5583383 247.88 0.000 137.3058 139.4947
Poor 139.4791 .8841156 157.76 0.000 137.7461 141.2121

How does it work?

Method 1: Average response

Let's work a simpler example without the interaction to help us understand how margins works. Let's begin by using tabulate to create indicator variables for each category of hlthstat.

. tabulate hlthstat, generate(hlthstat)

Health status Freq. Percent Cum.
Excellent 2,407 23.29 23.29
Very good 2,591 25.07 48.36
Good 2,938 28.43 76.79
Fair 1,670 16.16 92.95
Poor 729 7.05 100.00
Total 10,335 100.00

Let's list the first 10 observations to check our work.

. list hlthstat hlthstat1 hlthstat2 hlthstat3 hlthstat4 hlthstat5 in 1/10

hlthstat hlthst~1 hlthst~2 hlthst~3 hlthst~4 hlthst~5
1. Very good 0 1 0 0 0
2. Very good 0 1 0 0 0
3. Good 0 0 1 0 0
4. Fair 0 0 0 1 0
5. Very good 0 1 0 0 0
6. Poor 0 0 0 0 1
7. Very good 0 1 0 0 0
8. Excellent 1 0 0 0 0
9. Very good 0 1 0 0 0
10. Poor 0 0 0 0 1

Next let's fit a linear regression model including diabetes and hlthstat without the interaction. The option coeflegend displays a legend that includes terms that refer to the coefficients in the model.

. regress bpsystol i.diabetes i.hlthstat, coeflegend

Source SS df MS Number of obs = 10,335
F(5, 10329) = 153.09
Model 388422.444 5 77684.4888 Prob > F = 0.0000
Residual 5241550.91 10,329 507.459668 R-squared = 0.0690
Adj R-squared = 0.0685
Total 5629973.35 10,334 544.800982 Root MSE = 22.527
bpsystol Coefficient Legend
diabetes
Diabetic 11.8325 _b[1.diabetes]
hlthstat
Very good 2.894063 _b[2.hlthstat]
Good 7.61725 _b[3.hlthstat]
Fair 13.68941 _b[4.hlthstat]
Poor 14.34982 _b[5.hlthstat]
_cons 124.2011 _b[_cons]

Let's display the contents of _b[2.hlthstat] to verify that it equals 2.894063.

. display _b[2.hlthstat]
2.894063

Now we can use coefficients and indicator variables to generate a new variable that equals the expected SBP assuming every observation in the sample does not have diabetes.

. generate sbp_diab0 = _b[_cons] + _b[1.diabetes]*0 + _b[2.hlthstat] * hlthstat2
                                                      + _b[3.hlthstat] * hlthstat3
                                                      + _b[4.hlthstat] * hlthstat4
                                                      + _b[5.hlthstat] * hlthstat5
(16 missing values generated)

Next we can generate a new variable that equals the expected SBP assuming every observation in the sample has diabetes.

. generate sbp_diab1 = _b[_cons] + _b[1.diabetes]*1 + _b[2.hlthstat] * hlthstat2
                                                      + _b[3.hlthstat] * hlthstat3
                                                      + _b[4.hlthstat] * hlthstat4
                                                      + _b[5.hlthstat] * hlthstat5
(16 missing values generated)

Then we can calculate the average of the two variables to estimate the expected SBP for people with and without diabetes.

. table (), statistic(mean sbp_diab0 sbp_diab1)

sbp_diab0 130.3163
sbp_diab1 142.1488

This matches the results reported by margins.

. margins diabetes

Predictive margins                                      Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()

Delta-method
Margin std. err. t P>|t| [95% conf. interval]
diabetes
Not diabetic 130.3163 .2274263 573.00 0.000 129.8705 130.7621
Diabetic 142.1488 1.0333 137.57 0.000 140.1233 144.1742

Method 2: Response at average

In the previous example, we first calculated the response for each observation and then calculated the average of those responses. This is the default method. But we could also calculate the average covariate values first and then report the response at those average values.

Let's begin by using table to estimate the proportion of observations in each category of hlthstat.

. table (hlthstat), statistic(proportion) nformat(%12.7f proportion) nototal

Proportion
Health status
Excellent 0.2328979
Very good 0.2507015
Good 0.2842767
Fair 0.1615868
Poor 0.0705370

Then we can use those proportions (averages) to estimate the expected SBP assuming no one in the sample has diabetes.

. display _b[_cons] + _b[1.diabetes] * 0  + _b[2.hlthstat] * 0.2507015
                                            + _b[3.hlthstat] * 0.2842767
                                            + _b[4.hlthstat] * 0.1615868
                                            + _b[5.hlthstat] * 0.0705370

We can also calculate the expected SBP assuming everyone in the sample has diabetes.

. display _b[_cons] + _b[1.diabetes] * 1  + _b[2.hlthstat] * 0.2507015
                                            + _b[3.hlthstat] * 0.2842767
                                            + _b[4.hlthstat] * 0.1615868
                                            + _b[5.hlthstat] * 0.0705370

And we can check our work using margins with the atmeans option.

. margins diabetes, atmeans

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()
At: 0.diabetes = .9517175 (mean)
    1.diabetes = .0482825 (mean)
    1.hlthstat = .2328979 (mean)
    2.hlthstat = .2507015 (mean)
    3.hlthstat = .2842767 (mean)
    4.hlthstat = .1615868 (mean)
    5.hlthstat =  .070537 (mean)

Delta-method
Margin std. err. t P>|t| [95% conf. interval]
diabetes
Not diabetic 130.3163 .2274263 573.00 0.000 129.8705 130.7621
Diabetic 142.1488 1.0333 137.57 0.000 140.1233 144.1742

Estimating the average response (method 1) and the response at the average (method 2) gives us the same results for linear regression. But the results may differ for generalized linear models such as probit, logistic, or Poisson regression.

You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.