Stata's **margins** and **marginsplot** commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including **probit**, **logistic**, **poisson**, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the **nhanes2l** dataset. Then let's **describe** and **summarize** the variables **bpsystol**, **hlthstat**, **diabetes**, **age**, and **bmi**.

.webuse nhanes2l(Second National Health and Nutrition Examination Survey) .describe bpsystol hlthstat diabetes age bmi

Variable Storage Display Value name type format label Variable label |

bpsystol int %9.0g Systolic blood pressure hlthstat byte %20.0g hlth Health status diabetes byte %12.0g diabetes Diabetes status age byte %9.0g Age (years) bmi float %9.0g Body mass index (BMI) |

Variable | Obs Mean Std. dev. Min Max | |

bpsystol | 10,351 130.8817 23.33265 65 300 | |

hlthstat | 10,335 2.586164 1.206196 1 5 | |

diabetes | 10,349 .0482172 .2142353 0 1 | |

age | 10,351 47.57965 17.21483 20 74 | |

bmi | 10,351 25.5376 4.914969 12.3856 61.1297 |

We are going to fit a series of linear regression models for the outcome variable **bpsystol**, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. **hlthstat** measures health status with a range from 1 to 5. **diabetes** measures diabetes status with a range of 0 to 1. **age** measures age with a range of 20 to 74 years. And **bmi** measures body mass index with a range of 12.4 to 61.1 kg/m^{2}.

Let's fit a linear regression model using the continuous outcome variable **bpsystol**, the binary predictor variable **diabetes**, and the continuous predictor variable **age**. Note that I have used factor-variable notation to tell Stata that **diabetes** is categorical and **age** is continuous, and I have used the “##” operator to request the main effects and interaction of both predictor variables.

.regress bpsystol i.diabetes##c.age

Source | SS df MS | Number of obs = 10,349 | |

F(3, 10345) = 1071.05 | |||

Model | 1335031.79 3 445010.595 | Prob > F = 0.0000 | |

Residual | 4298248.26 10,345 415.490407 | R-squared = 0.2370 | |

Adj R-squared = 0.2368 | |||

Total | 5633280.05 10,348 544.38346 | Root MSE = 20.384 |

bpsystol | Coefficient Std. err. t P>|t| [95% conf. interval] | |

diabetes | ||

Diabetic | -5.669005 4.952369 -1.14 0.252 -15.37661 4.038595 | |

age | .6303981 .0119464 52.77 0.000 .6069808 .6538154 | |

diabetes# | ||

c.age | ||

Diabetic | .2233087 .0804934 2.77 0.006 .065526 .3810913 | |

_cons | 100.5111 .5969456 168.38 0.000 99.34096 101.6812 | |

The output can be challenging to interpret because we have two predictors and an interaction. We could spend our time carefully interpreting each coefficient, or we could calculate the expected SBP for combinations of diabetes status and various values of **age**. But Stata's **margins** command will estimate the expected SBP for combinations of the two predictor variables or for one predictor “adjusted for” the other. Note that the “i.” prefix is required in the **regress** command but not in the **margins** command.

Let's estimate marginal predictions of SBP for a 20-year-old with and without **diabetes**.

.margins diabetes, at(age=20)Adjusted predictions Number of obs = 10,349 Model VCE: OLS Expression: Linear prediction, predict() At: age = 20

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes | ||

Not diabetic | 113.119 .3815637 296.46 0.000 112.3711 113.867 | |

Diabetic | 111.9162 3.364884 33.26 0.000 105.3204 118.512 | |

We could do this manually, but it would be a lot of typing.

.display "E(SBP | no diabetes, age=20) = " 100.5111 + (-5.669005) * 0 + 0.6303981 * 20 + 0.2233087 * 0 * 20E(SBP | no diabetes, age=20) = 113.11906

.display "E(SBP | diabetes, age=20) = " 100.5111 + (-5.669005) * 1 + 0.6303981 * 20 + 0.2233087 * 1 * 20E(SBP | diabetes, age=20) = 111.91623

Next let's use **margins** to estimate the expected SBP for each category of **diabetes** at ages 20–60 in increments of 5 years.

.margins diabetes, at(age=(20(5)60))Adjusted predictions Number of obs = 10,349 Model VCE: OLS Expression: Linear prediction, predict() 1._at: age = 20 2._at: age = 25 3._at: age = 30 4._at: age = 35 5._at: age = 40 6._at: age = 45 7._at: age = 50 8._at: age = 55 9._at: age = 60

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

_at#diabetes | ||

1 # | ||

Not diabetic | 113.119 .3815637 296.46 0.000 112.3711 113.867 | |

1#Diabetic | 111.9162 3.364884 33.26 0.000 105.3204 118.512 | |

2 # | ||

Not diabetic | 116.271 .3327796 349.39 0.000 115.6187 116.9234 | |

2#Diabetic | 116.1847 2.983741 38.94 0.000 110.336 122.0335 | |

3 # | ||

Not diabetic | 119.423 .2881485 414.45 0.000 118.8582 119.9879 | |

3#Diabetic | 120.4533 2.607642 46.19 0.000 115.3418 125.5648 | |

4 # | ||

Not diabetic | 122.575 .2499055 490.49 0.000 122.0852 123.0649 | |

4#Diabetic | 124.7218 2.239132 55.70 0.000 120.3327 129.1109 | |

5 # | ||

Not diabetic | 125.727 .2213861 567.91 0.000 125.293 126.161 | |

5#Diabetic | 128.9904 1.882671 68.51 0.000 125.3 132.6808 | |

6 # | ||

Not diabetic | 128.879 .206656 623.64 0.000 128.4739 129.2841 | |

6#Diabetic | 133.2589 1.546613 86.16 0.000 130.2272 136.2905 | |

7 # | ||

Not diabetic | 132.031 .2086565 632.77 0.000 131.622 132.44 | |

7#Diabetic | 137.5274 1.247557 110.24 0.000 135.082 139.9729 | |

8 # | ||

Not diabetic | 135.183 .2269454 595.66 0.000 134.7381 135.6278 | |

8#Diabetic | 141.796 1.01863 139.20 0.000 139.7992 143.7927 | |

9 # | ||

Not diabetic | 138.335 .2580829 536.01 0.000 137.8291 138.8409 | |

9#Diabetic | 146.0645 .9141335 159.78 0.000 144.2726 147.8564 | |

The numbers reported in the Margin column are average values of the linear prediction of SBP for each combination of **diabetes** category and **age**. For example, the output tells us that the expected SBP is 113.119 for a 20-year-old person without diabetes and the expected SBP is 146.0645 for a 60-year-old person with diabetes.

The output also reports a standard error, *t* statistic, *p*-value, and 95% confidence interval for each estimate. The *t* statistic tests the null hypothesis that the expected SBP is zero.

We can plot the marginal predictions and their 95% confidence intervals by typing **marginsplot**.

.marginsplotVariables that uniquely identify margins: age diabetes

Let's add more options to make our graph look nicer. We can use the **legend()** option to customize the look of the legend. And we can use the **title()**, **subtitle()**, and **ytitle()** options to add various titles to our graph.

.marginsplot, ytitle("Expected systolic blood pressure (mmHg)") title("Expected systolic blood pressure") subtitle("By age and diabetes status") legend(order(1 "No diabetes" 2 "Diabetes") rows(1) position(12))Variables that uniquely identify margins: age diabetes

We can also use **margins** to estimate marginal predictions for one variable averaged over other variables in the model. For example, we can estimate the expected SBP for categories of **diabetes** averaged over **age**.

.margins diabetesPredictive margins Number of obs = 10,349 Model VCE: OLS Expression: Linear prediction, predict()

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes | ||

Not diabetic | 130.5066 .2055351 634.96 0.000 130.1037 130.9094 | |

Diabetic | 135.463 1.385992 97.74 0.000 132.7462 138.1798 | |

Let's work a simpler example without the interaction to help us understand how **margins** works. Let's fit a linear regression model including **diabetes** and **hlthstat** without the interaction. The option **coeflegend** displays a legend that includes terms that refer to the coefficients in the model.

.regress bpsystol i.diabetes c.age, coeflegend

Source | SS df MS | Number of obs = 10,349 | |

F(2, 10346) = 1601.69 | |||

Model | 1331833.99 2 665916.993 | Prob > F = 0.0000 | |

Residual | 4301446.06 10,346 415.759333 | R-squared = 0.2364 | |

Adj R-squared = 0.2363 | |||

Total | 5633280.05 10,348 544.38346 | Root MSE = 20.39 |

bpsystol | Coefficient Legend | |

diabetes | ||

Diabetic | 7.815281 _b[1.diabetes] | |

age | .6353169 _b[age] | |

_cons | 100.2803 _b[_cons] | |

Let's display the contents of **_b[1.diabetes]** to verify that it equals 7.815281.

.display _b[1.diabetes]7.8152815

Now we can use coefficients and indicator variables to **generate** a new variable that equals the expected SBP assuming every observation in the sample does not have diabetes.

.generate double sbp_diab0 = _b[_cons] + _b[1.diabetes]*0 + _b[age] * age

Next we can **generate** a new variable that equals the expected SBP assuming every observation in the sample has diabetes.

.generate double sbp_diab1 = _b[_cons] + _b[1.diabetes]*1 + _b[age] * age

Then we can calculate the average of the two variables to estimate the expected SBP for people with, and without, diabetes. The option **if e(sample)** restricts the calculation to observations that are not missing values for **bpsystol**, **diabetes**, or **age**.

.table () if e(sample), statistic(mean sbp_diab0 sbp_diab1)

sbp_diab0 | 130.5098 | |

sbp_diab1 | 138.3251 | |

This matches the results reported by **margins**.

.margins diabetesPredictive margins Number of obs = 10,349 Model VCE: OLS Expression: Linear prediction, predict()

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes | ||

Not diabetic | 130.5098 .2055982 634.78 0.000 130.1068 130.9128 | |

Diabetic | 138.3251 .9258365 149.41 0.000 136.5103 140.1399 | |

In the previous example, we first calculated the response for each observation and then calculated the average of those responses. This is the default method. But we could also calculate the average covariate values first and then report the response at those average values.

Let's begin by using **table** to estimate the mean of **age**. The option **if e(sample)** restricts the calculation to observations that are not missing values for **bpsystol**, **diabetes**, or **age**.

.table () if e(sample), statistic(mean age)

Mean | 47.5818 | |

Then we can use the mean **age** to estimate the expected SBP assuming no one in the sample has diabetes.

.display _b[_cons] + _b[1.diabetes] * 0 + _b[age] * 47.5818

We can also calculate the expected SBP assuming everyone in the sample has diabetes.

.display _b[_cons] + _b[1.diabetes] * 1 + _b[age] * 47.5818

And we can check our work using **margins** with the **atmeans** option.

.margins diabetes, atmeansAdjusted predictions Number of obs = 10,349 Model VCE: OLS Expression: Linear prediction, predict() At: 0.diabetes = .9517828 (mean) 1.diabetes = .0482172 (mean) age = 47.5818 (mean)

Delta-method | ||

Margin std. err. t P>|t| [95% conf. interval] | ||

diabetes | ||

Not diabetic | 130.5098 .2055982 634.78 0.000 130.1068 130.9128 | |

Diabetic | 138.3251 .9258365 149.41 0.000 136.5103 140.1399 | |

Again, the manually calculated results match the results produced by **margins**.

Estimating the average response (method 1) and the response at the average (method 2) gives us the same results for linear regression. But the results may differ for generalized linear models such as probit, logistic, or Poisson regression.

You can read more about factor-variable notation, **margins**, and **marginsplot** in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.

Read more in the *Stata Base Reference Manual*; see **[R] margins**, **[R] marginsplot**, and **[R] regress**. And in the *Stata Userâ€™s Guide*, see **[U-11] factor variables**.