margins and marginsplot for the interaction of categorical and continuous predictor variables

Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, hlthstat, diabetes, age, and bmi.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe bpsystol hlthstat diabetes age bmi


Variable      Storage   Display    Value
    name         type    format    label      Variable label
                                                                                
bpsystol        int     %9.0g                 Systolic blood pressure
hlthstat        byte    %20.0g     hlth       Health status
diabetes        byte    %12.0g     diabetes   Diabetes status
age             byte    %9.0g                 Age (years)
bmi             float   %9.0g                 Body mass index (BMI)


. summarize bpsystol hlthstat diabetes age bmi


    Variable          Obs        Mean    Std. dev.       Min        Max
   
    bpsystol       10,351    130.8817    23.33265         65        300
    hlthstat       10,335    2.586164    1.206196          1          5
    diabetes       10,349    .0482172    .2142353          0          1
         age       10,351    47.57965    17.21483         20         74
         bmi       10,351     25.5376    4.914969    12.3856    61.1297

We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m².

Let's fit a linear regression model using the continuous outcome variable bpsystol, the binary predictor variable diabetes, and the continuous predictor variable age. Note that I have used factor-variable notation to tell Stata that diabetes is categorical and age is continuous, and I have used the “##” operator to request the main effects and interaction of both predictor variables.

. regress bpsystol i.diabetes##c.age


      Source         SS           df       MS     Number of obs   =    10,349
      F(3, 10345)     =   1071.05
       Model    1335031.79         3  445010.595    Prob > F        =    0.0000
    Residual    4298248.26    10,345  415.490407    R-squared       =    0.2370
      Adj R-squared   =    0.2368
       Total    5633280.05    10,348   544.38346    Root MSE        =    20.384




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    diabetes                                                                  
   Diabetic     -5.669005   4.952369    -1.14   0.252    -15.37661    4.038595
         age     .6303981   .0119464    52.77   0.000     .6069808    .6538154
                                                                              
    diabetes#                                                                  
       c.age                                                                  
   Diabetic      .2233087   .0804934     2.77   0.006      .065526    .3810913
                                                                              
       _cons     100.5111   .5969456   168.38   0.000     99.34096    101.6812

The output can be challenging to interpret because we have two predictors and an interaction. We could spend our time carefully interpreting each coefficient, or we could calculate the expected SBP for combinations of diabetes status and various values of age. But Stata's margins command will estimate the expected SBP for combinations of the two predictor variables or for one predictor “adjusted for” the other. Note that the “i.” prefix is required in the regress command but not in the margins command.

Let's estimate marginal predictions of SBP for a 20-year-old with and without diabetes.

. margins diabetes, at(age=20)

Adjusted predictions                                    Number of obs = 10,349
Model VCE: OLS

Expression: Linear prediction, predict()
At: age = 20



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes                                                                  
Not diabetic       113.119   .3815637   296.46   0.000     112.3711     113.867
    Diabetic      111.9162   3.364884    33.26   0.000     105.3204     118.512

We could do this manually, but it would be a lot of typing.

. display "E(SBP | no diabetes, age=20) = "    100.5111 
                                              + (-5.669005) * 0
                                              + 0.6303981   * 20
                                              + 0.2233087   * 0 * 20
E(SBP | no diabetes, age=20) = 113.11906

. display "E(SBP | diabetes, age=20) = "    100.5111 
                                           + (-5.669005) * 1
                                           + 0.6303981   * 20
                                           + 0.2233087   * 1 * 20
E(SBP | diabetes, age=20) = 111.91623

Next let's use margins to estimate the expected SBP for each category of diabetes at ages 20–60 in increments of 5 years.

. margins diabetes, at(age=(20(5)60))


Adjusted predictions                                    Number of obs = 10,349
Model VCE: OLS

Expression: Linear prediction, predict()
1._at: age = 20
2._at: age = 25
3._at: age = 30
4._at: age = 35
5._at: age = 40
6._at: age = 45
7._at: age = 50
8._at: age = 55
9._at: age = 60



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
 _at#diabetes                                                                  
           1 #                                                                  
Not diabetic       113.119   .3815637   296.46   0.000     112.3711     113.867
  1#Diabetic      111.9162   3.364884    33.26   0.000     105.3204     118.512
           2 #                                                                  
Not diabetic       116.271   .3327796   349.39   0.000     115.6187    116.9234
  2#Diabetic      116.1847   2.983741    38.94   0.000      110.336    122.0335
           3 #                                                                  
Not diabetic       119.423   .2881485   414.45   0.000     118.8582    119.9879
  3#Diabetic      120.4533   2.607642    46.19   0.000     115.3418    125.5648
           4 #                                                                  
Not diabetic       122.575   .2499055   490.49   0.000     122.0852    123.0649
  4#Diabetic      124.7218   2.239132    55.70   0.000     120.3327    129.1109
           5 #                                                                  
Not diabetic       125.727   .2213861   567.91   0.000      125.293     126.161
  5#Diabetic      128.9904   1.882671    68.51   0.000        125.3    132.6808
           6 #                                                                  
Not diabetic       128.879    .206656   623.64   0.000     128.4739    129.2841
  6#Diabetic      133.2589   1.546613    86.16   0.000     130.2272    136.2905
           7 #                                                                  
Not diabetic       132.031   .2086565   632.77   0.000      131.622      132.44
  7#Diabetic      137.5274   1.247557   110.24   0.000      135.082    139.9729
           8 #                                                                  
Not diabetic       135.183   .2269454   595.66   0.000     134.7381    135.6278
  8#Diabetic       141.796    1.01863   139.20   0.000     139.7992    143.7927
           9 #                                                                  
Not diabetic       138.335   .2580829   536.01   0.000     137.8291    138.8409
  9#Diabetic      146.0645   .9141335   159.78   0.000     144.2726    147.8564

The numbers reported in the Margin column are average values of the linear prediction of SBP for each combination of diabetes category and age. For example, the output tells us that the expected SBP is 113.119 for a 20-year-old person without diabetes and the expected SBP is 146.0645 for a 60-year-old person with diabetes.

The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.

We can plot the marginal predictions and their 95% confidence intervals by typing marginsplot.

. marginsplot

Variables that uniquely identify margins: age diabetes

Let's add more options to make our graph look nicer. We can use the legend() option to customize the look of the legend. And we can use the title(), subtitle(), and ytitle() options to add various titles to our graph.

. marginsplot, ytitle("Expected systolic blood pressure (mmHg)")      
                title("Expected systolic blood pressure") 
                subtitle("By age and diabetes status") 
                legend(order(1 "No diabetes" 2 "Diabetes")
                rows(1) position(12))

Variables that uniquely identify margins: age diabetes

Marginal effects

We can also use margins to estimate marginal predictions for one variable averaged over other variables in the model. For example, we can estimate the expected SBP for categories of diabetes averaged over age.

. margins diabetes

Predictive margins                                      Number of obs = 10,349
Model VCE: OLS

Expression: Linear prediction, predict()



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes                                                                  
Not diabetic      130.5066   .2055351   634.96   0.000     130.1037    130.9094
    Diabetic       135.463   1.385992    97.74   0.000     132.7462    138.1798

How does it work?

Method 1: Average response

Let's work a simpler example without the interaction to help us understand how margins works. Let's fit a linear regression model including diabetes and hlthstat without the interaction. The option coeflegend displays a legend that includes terms that refer to the coefficients in the model.

. regress bpsystol i.diabetes c.age, coeflegend


      Source         SS           df       MS     Number of obs   =    10,349
      F(2, 10346)     =   1601.69
       Model    1331833.99         2  665916.993    Prob > F        =    0.0000
    Residual    4301446.06    10,346  415.759333    R-squared       =    0.2364
      Adj R-squared   =    0.2363
       Total    5633280.05    10,348   544.38346    Root MSE        =     20.39




    bpsystol   Coefficient  Legend                                            
   
    diabetes                                                                  
   Diabetic      7.815281  _b[1.diabetes]                                     
         age     .6353169  _b[age]                                            
       _cons     100.2803  _b[_cons]

Let's display the contents of _b[1.diabetes] to verify that it equals 7.815281.

. display _b[1.diabetes]
7.8152815

Now we can use coefficients and indicator variables to generate a new variable that equals the expected SBP assuming every observation in the sample does not have diabetes.

. generate double sbp_diab0 = _b[_cons] + _b[1.diabetes]*0 + _b[age] * age

Next we can generate a new variable that equals the expected SBP assuming every observation in the sample has diabetes.

. generate double sbp_diab1 = _b[_cons] + _b[1.diabetes]*1 + _b[age] * age

Then we can calculate the average of the two variables to estimate the expected SBP for people with, and without, diabetes. The option if e(sample) restricts the calculation to observations that are not missing values for bpsystol, diabetes, or age.

. table () if e(sample), statistic(mean sbp_diab0 sbp_diab1)



sbp_diab0    130.5098
sbp_diab1    138.3251

This matches the results reported by margins.

. margins diabetes

Predictive margins                                      Number of obs = 10,349
Model VCE: OLS

Expression: Linear prediction, predict()



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes                                                                  
Not diabetic      130.5098   .2055982   634.78   0.000     130.1068    130.9128
    Diabetic      138.3251   .9258365   149.41   0.000     136.5103    140.1399

Method 2: Response at average

In the previous example, we first calculated the response for each observation and then calculated the average of those responses. This is the default method. But we could also calculate the average covariate values first and then report the response at those average values.

Let's begin by using table to estimate the mean of age. The option if e(sample) restricts the calculation to observations that are not missing values for bpsystol, diabetes, or age.

. table () if e(sample), statistic(mean age)



Mean    47.5818

Then we can use the mean age to estimate the expected SBP assuming no one in the sample has diabetes.

. display _b[_cons] + _b[1.diabetes] * 0  + _b[age] * 47.5818

We can also calculate the expected SBP assuming everyone in the sample has diabetes.

. display _b[_cons] + _b[1.diabetes] * 1  + _b[age] * 47.5818

And we can check our work using margins with the atmeans option.

. margins diabetes, atmeans

Adjusted predictions                                    Number of obs = 10,349
Model VCE: OLS

Expression: Linear prediction, predict()
At: 0.diabetes = .9517828 (mean)
    1.diabetes = .0482172 (mean)
    age        =  47.5818 (mean)



                           Delta-method                                        
                    Margin   std. err.      t    P>|t|     [95% conf. interval]
   
     diabetes                                                                  
Not diabetic      130.5098   .2055982   634.78   0.000     130.1068    130.9128
    Diabetic      138.3251   .9258365   149.41   0.000     136.5103    140.1399

Again, the manually calculated results match the results produced by margins.

Estimating the average response (method 1) and the response at the average (method 2) gives us the same results for linear regression. But the results may differ for generalized linear models such as probit, logistic, or Poisson regression.

You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.

See it in action

Watch Introduction to margins in Stata, part 3: Interactions.

Watch Profile plots and interaction plots in Stata: Interactions of categorical and continuous variables.

Tell me more

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Variable		Obs Mean Std. dev. Min Max

bpsystol		10,351 130.8817 23.33265 65 300
hlthstat		10,335 2.586164 1.206196 1 5
diabetes		10,349 .0482172 .2142353 0 1
age		10,351 47.57965 17.21483 20 74
bmi		10,351 25.5376 4.914969 12.3856 61.1297

Source	SS df MS	Number of obs = 10,349
		F(3, 10345) = 1071.05
Model	1335031.79 3 445010.595	Prob > F = 0.0000
Residual	4298248.26 10,345 415.490407	R-squared = 0.2370
		Adj R-squared = 0.2368
Total	5633280.05 10,348 544.38346	Root MSE = 20.384


bpsystol		Coefficient Std. err. t P>\|t\| [95% conf. interval]

diabetes
Diabetic		-5.669005 4.952369 -1.14 0.252 -15.37661 4.038595
age		.6303981 .0119464 52.77 0.000 .6069808 .6538154

diabetes#
c.age
Diabetic		.2233087 .0804934 2.77 0.006 .065526 .3810913

_cons		100.5111 .5969456 168.38 0.000 99.34096 101.6812


		Delta-method
		Margin std. err. t P>\|t\| [95% conf. interval]

diabetes
Not diabetic		113.119 .3815637 296.46 0.000 112.3711 113.867
Diabetic		111.9162 3.364884 33.26 0.000 105.3204 118.512


		Delta-method
		Margin std. err. t P>\|t\| [95% conf. interval]

_at#diabetes
1 #
Not diabetic		113.119 .3815637 296.46 0.000 112.3711 113.867
1#Diabetic		111.9162 3.364884 33.26 0.000 105.3204 118.512
2 #
Not diabetic		116.271 .3327796 349.39 0.000 115.6187 116.9234
2#Diabetic		116.1847 2.983741 38.94 0.000 110.336 122.0335
3 #
Not diabetic		119.423 .2881485 414.45 0.000 118.8582 119.9879
3#Diabetic		120.4533 2.607642 46.19 0.000 115.3418 125.5648
4 #
Not diabetic		122.575 .2499055 490.49 0.000 122.0852 123.0649
4#Diabetic		124.7218 2.239132 55.70 0.000 120.3327 129.1109
5 #
Not diabetic		125.727 .2213861 567.91 0.000 125.293 126.161
5#Diabetic		128.9904 1.882671 68.51 0.000 125.3 132.6808
6 #
Not diabetic		128.879 .206656 623.64 0.000 128.4739 129.2841
6#Diabetic		133.2589 1.546613 86.16 0.000 130.2272 136.2905
7 #
Not diabetic		132.031 .2086565 632.77 0.000 131.622 132.44
7#Diabetic		137.5274 1.247557 110.24 0.000 135.082 139.9729
8 #
Not diabetic		135.183 .2269454 595.66 0.000 134.7381 135.6278
8#Diabetic		141.796 1.01863 139.20 0.000 139.7992 143.7927
9 #
Not diabetic		138.335 .2580829 536.01 0.000 137.8291 138.8409
9#Diabetic		146.0645 .9141335 159.78 0.000 144.2726 147.8564


bpsystol		Coefficient Legend

diabetes
Diabetic		7.815281 _b[1.diabetes]
age		.6353169 _b[age]
_cons		100.2803 _b[_cons]