Stata's **margins** and **marginsplot** commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including **probit**, **logistic**, **poisson**, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the **nhanes2l** dataset. Then let's **describe** and **summarize** the variables **bpsystol**, **hlthstat**, **diabetes**, **age**, and **bmi**.

.webuse nhanes2l(Second National Health and Nutrition Examination Survey) .describe bpsystol hlthstat diabetes age bmi

Variable Storage Display Value name type format label Variable label |

bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) bmi float %9.0g Body mass index (BMI) |

Variable | Obs Mean Std. dev. Min Max | |

bpsystol | 10,351 130.8817 23.33265 65 300 | |

hlthstat | 10,335 2.586164 1.206196 1 5 | |

diabetes | 10,349 .0482172 .2142353 0 1 | |

age | 10,351 47.57965 17.21483 20 74 | |

bmi | 10,351 25.5376 4.914969 12.3856 61.1297 |

We are going to fit a series of linear regression models for the outcome variable **bpsystol**, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. **hlthstat** measures health status with a range from 1 to 5. **diabetes** measures diabetes status with a range of 0 to 1. **age** measures age with a range of 20 to 74 years. And **bmi** measures body mass index with a range of 12.4 to 61.1 kg/m^{2}.

Let's fit a linear regression model using the continuous outcome variable **bpsystol**, the binary predictor variable **diabetes**, and the categorical predictor variable **hlthstat**. Note that I have used factor-variable notation to tell Stata that **diabetes** and **hlthstat** are categorical predictors, and I have used the “##” operator to request the main effects and interaction of both predictor variables.

.regress bpsystol i.hlthstat##i.diabetes

Source | SS df MS | Number of obs = 10,335 | |

F(9, 10325) = 86.92 | |||

Model | 396524.045 9 44058.2272 | Prob > F = 0.0000 | |

Residual | 5233449.31 10,325 506.871604 | R-squared = 0.0704 | |

Adj R-squared = 0.0696 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 22.514 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

hlthstat | ||

Very good | 2.636051 .6417076 4.11 0.000 1.37818 3.893922 | |

Good | 7.648725 .6272209 12.19 0.000 6.419251 8.8782 | |

Fair | 13.50647 .7408272 18.23 0.000 12.0543 14.95863 | |

Poor | 14.77223 1.032484 14.31 0.000 12.74837 16.7961 | |

diabetes | ||

Diabetic | 5.780232 4.618696 1.25 0.211 -3.273308 14.83377 | |

hlthstat# | ||

diabetes | ||

Very good # | ||

Diabetic | 17.43339 5.726714 3.04 0.002 6.207924 28.65886 | |

Good # | ||

Diabetic | 4.023894 5.032308 0.80 0.424 -5.840404 13.88819 | |

Fair # | ||

Diabetic | 7.316062 4.97969 1.47 0.142 -2.445096 17.07722 | |

Poor # | ||

Diabetic | 3.445358 5.09316 0.68 0.499 -6.538222 13.42894 | |

_cons | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |

The output can be challenging to interpret because we have two predictors and an interaction, which produces many estimated coefficients. We could spend our time carefully interpreting each coefficient, or we could calculate the expected SBP for every combination of diabetes status and health status. But Stata's **margins** command will estimate the expected SBP for combinations of the two predictor variables or for one predictor “adjusted for” the other. Note that the “i.” prefix is required in the **regress** command but not in the **margins** command.

Let's estimate marginal predictions of SBP for each combination of the categories of **hlthstat** and **diabetes**.

.margins diabetes#hlthstatAdjusted predictions Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes# | ||

hlthstat | ||

Not diabetic # | ||

Excellent | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |

Not diabetic # | ||

Very good | 126.8975 .4461899 284.40 0.000 126.0229 127.7721 | |

Not diabetic # | ||

Good | 131.9102 .4250916 310.31 0.000 131.0769 132.7434 | |

Not diabetic # | ||

Fair | 137.7679 .5797601 237.63 0.000 136.6315 138.9043 | |

Not diabetic # | ||

Poor | 139.0337 .9237528 150.51 0.000 137.2229 140.8444 | |

Diabetic # | ||

Excellent | 130.0417 4.595612 28.30 0.000 121.0334 139.05 | |

Diabetic # | ||

Very good | 150.1111 3.356161 44.73 0.000 143.5324 156.6898 | |

Diabetic # | ||

Good | 141.7143 1.952195 72.59 0.000 137.8876 145.541 | |

Diabetic # | ||

Fair | 150.8642 1.768852 85.29 0.000 147.3969 154.3315 | |

Diabetic # | ||

Poor | 148.2593 1.93768 76.51 0.000 144.461 152.0575 | |

The numbers reported in the “Margin” column are average values of the linear prediction of SBP for each combination of the categories of **hlthstat** and **diabetes**. For example, the output tells us that the expected SBP is 124.2614 for a person in “Excellent” health without diabetes and the expected SBP is 148.2593 for a person in “Poor” health with diabetes.

The output also reports a standard error, *t* statistic, *p*-value, and 95% confidence interval for each estimate. The *t* statistic tests the null hypothesis that the expected SBP is zero.

We can plot the marginal predictions and their 95% confidence intervals by typing **marginsplot**.

.marginsplotVariables that uniquely identify margins: diabetes hlthstat

The resulting profile plot shows a separate line for each category of **hlthstat**. We could change the plot by changing the order of the variables in the **margins** command.

.margins hlthstat#diabetesAdjusted predictions Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

hlthstat | ||

diabetes | ||

Excellent # | ||

Not diabetic | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 | |

Excellent # | ||

Diabetic | 130.0417 4.595612 28.30 0.000 121.0334 139.05 | |

Very good # | ||

Not diabetic | 126.8975 .4461899 284.40 0.000 126.0229 127.7721 | |

Very good # | ||

Diabetic | 150.1111 3.356161 44.73 0.000 143.5324 156.6898 | |

Good # | ||

Not diabetic | 131.9102 .4250916 310.31 0.000 131.0769 132.7434 | |

Good # | ||

Diabetic | 141.7143 1.952195 72.59 0.000 137.8876 145.541 | |

Fair # | ||

Not diabetic | 137.7679 .5797601 237.63 0.000 136.6315 138.9043 | |

Fair # | ||

Diabetic | 150.8642 1.768852 85.29 0.000 147.3969 154.3315 | |

Poor # | ||

Not diabetic | 139.0337 .9237528 150.51 0.000 137.2229 140.8444 | |

Poor # | ||

Diabetic | 148.2593 1.93768 76.51 0.000 144.461 152.0575 | |

By default, **marginsplot** creates a profile plot using lines. We can use the **recast(bar)** option if we prefer a bar chart, or “dynamite plunger plot”. The option **xdimension()** defines the display of the *x* axis.

.marginsplot, recast(bar) xdimension(hlthstat diabetes)Variables that uniquely identify margins: hlthstat diabetes

The overlap of the horizontal axis labels make them impossible to read. We can make them legible by adding the **horizontal** option to create a horizontal bar chart.

.marginsplot, recast(bar) horizontal xdimension(hlthstat diabetes)

Let's add more options to make our graph look nicer. We can use the **plotopts(barwidth))** option to add some space between the bars. And we can use the **title()**, **subtitle()**, **xtitle()**, and **ytitle()** options to add various titles to our graph.

.marginsplot, recast(bar) xdimension(hlthstat diabetes) horizontal plotopts(barwidth(0.8)) ytitle("") xtitle("Expected systolic blood pressure") title("Expected systolic blood pressure (mmHg)") subtitle("By health status and diabetes status")Variables that uniquely identify margins: hlthstat diabetes

We can also use **margins** to estimate marginal predictions for one variable averaged over other variables in the model. For example, we can estimate the expected SBP for categories of **diabetes** averaged over **hlthstat**.

.margins diabetesPredictive margins Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes | ||

Not diabetic | 130.3211 .2273218 573.29 0.000 129.8755 130.7667 | |

Diabetic | 143.041 1.50395 95.11 0.000 140.093 145.9891 | |

Or we could estimate the expected SBP for categories of **hlthstat** averaged over **diabetes**.

.margins hlthstatPredictive margins Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

hlthstat | ||

Excellent | 124.5405 .4918267 253.22 0.000 123.5764 125.5046 | |

Very good | 128.0183 .4545142 281.66 0.000 127.1274 128.9092 | |

Good | 132.3835 .4154021 318.69 0.000 131.5693 133.1978 | |

Fair | 138.4002 .5583383 247.88 0.000 137.3058 139.4947 | |

Poor | 139.4791 .8841156 157.76 0.000 137.7461 141.2121 | |

Let's work a simpler example without the interaction to help us understand how **margins** works. Let's begin by using **tabulate** to create indicator variables for each category of **hlthstat**.

.tabulate hlthstat, generate(hlthstat)

Health status | Freq. Percent Cum. | |

Excellent | 2,407 23.29 23.29 | |

Very good | 2,591 25.07 48.36 | |

Good | 2,938 28.43 76.79 | |

Fair | 1,670 16.16 92.95 | |

Poor | 729 7.05 100.00 | |

Total | 10,335 100.00 |

Let's list the first 10 observations to check our work.

.list hlthstat hlthstat1 hlthstat2 hlthstat3 hlthstat4 hlthstat5 in 1/10

hlthstat hlthst~1 hlthst~2 hlthst~3 hlthst~4 hlthst~5 | |

1. | Very good 0 1 0 0 0 |

2. | Very good 0 1 0 0 0 |

3. | Good 0 0 1 0 0 |

4. | Fair 0 0 0 1 0 |

5. | Very good 0 1 0 0 0 |

6. | Poor 0 0 0 0 1 |

7. | Very good 0 1 0 0 0 |

8. | Excellent 1 0 0 0 0 |

9. | Very good 0 1 0 0 0 |

10. | Poor 0 0 0 0 1 |

Next let's fit a linear regression model including **diabetes** and **hlthstat** without the interaction. The option **coeflegend** displays a legend that includes terms that refer to the coefficients in the model.

.regress bpsystol i.diabetes i.hlthstat, coeflegend

Source | SS df MS | Number of obs = 10,335 | |

F(5, 10329) = 153.09 | |||

Model | 388422.444 5 77684.4888 | Prob > F = 0.0000 | |

Residual | 5241550.91 10,329 507.459668 | R-squared = 0.0690 | |

Adj R-squared = 0.0685 | |||

Total | 5629973.35 10,334 544.800982 | Root MSE = 22.527 |

bpsystol | Coefficient Legend | |

diabetes | ||

Diabetic | 11.8325 _b[1.diabetes] | |

hlthstat | ||

Very good | 2.894063 _b[2.hlthstat] | |

Good | 7.61725 _b[3.hlthstat] | |

Fair | 13.68941 _b[4.hlthstat] | |

Poor | 14.34982 _b[5.hlthstat] | |

_cons | 124.2011 _b[_cons] | |

Let's display the contents of **_b[2.hlthstat]** to verify that it equals 2.894063.

.display _b[2.hlthstat]2.894063

Now we can use coefficients and indicator variables to **generate** a new variable that equals the expected SBP assuming every observation in the sample does not have diabetes.

.generate sbp_diab0 = _b[_cons] + _b[1.diabetes]*0 + _b[2.hlthstat] * hlthstat2 + _b[3.hlthstat] * hlthstat3 + _b[4.hlthstat] * hlthstat4 + _b[5.hlthstat] * hlthstat5(16 missing values generated)

Next we can **generate** a new variable that equals the expected SBP assuming every observation in the sample has diabetes.

.generate sbp_diab1 = _b[_cons] + _b[1.diabetes]*1 + _b[2.hlthstat] * hlthstat2 + _b[3.hlthstat] * hlthstat3 + _b[4.hlthstat] * hlthstat4 + _b[5.hlthstat] * hlthstat5(16 missing values generated)

Then we can calculate the average of the two variables to estimate the expected SBP for people with and without diabetes.

.table (), statistic(mean sbp_diab0 sbp_diab1)

sbp_diab0 | 130.3163 | |

sbp_diab1 | 142.1488 | |

This matches the results reported by **margins**.

.margins diabetesPredictive margins Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict()

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes | ||

Not diabetic | 130.3163 .2274263 573.00 0.000 129.8705 130.7621 | |

Diabetic | 142.1488 1.0333 137.57 0.000 140.1233 144.1742 | |

In the previous example, we first calculated the response for each observation and then calculated the average of those responses. This is the default method. But we could also calculate the average covariate values first and then report the response at those average values.

Let's begin by using **table** to estimate the proportion of observations in each category of **hlthstat**.

.table (hlthstat), statistic(proportion) nformat(%12.7f proportion) nototal

Proportion | ||

Health status | ||

Excellent | 0.2328979 | |

Very good | 0.2507015 | |

Good | 0.2842767 | |

Fair | 0.1615868 | |

Poor | 0.0705370 | |

Then we can use those proportions (averages) to estimate the expected SBP assuming no one in the sample has diabetes.

.display _b[_cons] + _b[1.diabetes] * 0 + _b[2.hlthstat] * 0.2507015 + _b[3.hlthstat] * 0.2842767 + _b[4.hlthstat] * 0.1615868 + _b[5.hlthstat] * 0.0705370

We can also calculate the expected SBP assuming everyone in the sample has diabetes.

.display _b[_cons] + _b[1.diabetes] * 1 + _b[2.hlthstat] * 0.2507015 + _b[3.hlthstat] * 0.2842767 + _b[4.hlthstat] * 0.1615868 + _b[5.hlthstat] * 0.0705370

And we can check our work using **margins** with the **atmeans** option.

.margins diabetes, atmeansAdjusted predictions Number of obs = 10,335 Model VCE: OLS Expression: Linear prediction, predict() At: 0.diabetes = .9517175 (mean) 1.diabetes = .0482825 (mean) 1.hlthstat = .2328979 (mean) 2.hlthstat = .2507015 (mean) 3.hlthstat = .2842767 (mean) 4.hlthstat = .1615868 (mean) 5.hlthstat = .070537 (mean)

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes | ||

Not diabetic | 130.3163 .2274263 573.00 0.000 129.8705 130.7621 | |

Diabetic | 142.1488 1.0333 137.57 0.000 140.1233 144.1742 | |

Estimating the average response (method 1) and the response at the average (method 2) gives us the same results for linear regression. But the results may differ for generalized linear models such as probit, logistic, or Poisson regression.

You can read more about factor-variable notation, **margins**, and **marginsplot** in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.

Read more in the *Stata Base Reference Manual*; see **[R] margins**, **[R] marginsplot**, and **[R] regress**. And in the *Stata Userâ€™s Guide*, see **[U-11] factor variables**.