margins and marginsplot for a categorical predictor variable

Stata's margins and marginsplot commands are powerful tools for visualizing the results of regression models. We will use linear regression below, but the same principles and syntax work with nearly all of Stata's regression commands, including probit, logistic, poisson, and others. You will want to review Stata's factor-variable notation if you have not used it before.

Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, age, bmi, diabetes, and hlthstat.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe bpsystol hlthstat diabetes age bmi


Variable      Storage   Display    Value
    name         type    format    label      Variable label
                                                                                
bpsystol        int     %9.0g                 Systolic blood pressure
hlthstat        byte    %20.0g     hlth       Health status
diabetes        byte    %12.0g     diabetes   Diabetes status
age             byte    %9.0g                 Age (years)
bmi             float   %9.0g                 Body mass index (BMI)


. summarize bpsystol hlthstat diabetes age bmi


    Variable          Obs        Mean    Std. dev.       Min        Max
   
    bpsystol       10,351    130.8817    23.33265         65        300
    hlthstat       10,335    2.586164    1.206196          1          5
    diabetes       10,349    .0482172    .2142353          0          1
         age       10,351    47.57965    17.21483         20         74
         bmi       10,351     25.5376    4.914969    12.3856    61.1297

We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure (SBP) with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m².

The description tells us that the value label hlth is attached to the variable hlthstat. Let's type label list hlth to view the categories of hlthstat.

. label list hlth

hlth:
           1 Excellent
           2 Very good
           3 Good
           4 Fair
           5 Poor
          .a Blank but applicable

The variable hlthstat has five categories numbered 1 through 5 and labeled “Excellent”, “Very good”, “Good”, “Fair”, and “Poor”, respectively. Category “.a” is a missing value that will be omitted from the regression model.

Let's fit a linear regression model using the continuous outcome variable bpsystol and the categorical predictor variable hlthstat. Note that I have used factor-variable notation to tell Stata that hlthstat is a categorical predictor.

. regress bpsystol i.hlthstat


      Source         SS           df       MS     Number of obs   =    10,335
      F(4, 10330)     =    158.34
       Model    325244.686         4  81311.1715    Prob > F        =    0.0000
    Residual    5304728.67    10,330  513.526492    R-squared       =    0.0578
      Adj R-squared   =    0.0574
       Total    5629973.35    10,334  544.800982    Root MSE        =    22.661




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
    hlthstat                                                                  
  Very good      2.981587   .6415165     4.65   0.000      1.72409    4.239083
       Good      8.034913   .6230047    12.90   0.000     6.813703    9.256123
       Fair      14.71925    .721698    20.40   0.000     13.30459    16.13392
       Poor      16.42304   .9580047    17.14   0.000     14.54517    18.30092
                                                                              
       _cons     124.3191   .4618951   269.15   0.000     123.4137    125.2245

The output does not include the “Excellent” hlthstat category because Stata uses the category with the smallest number as the referent category. So the coefficient labeled “_cons” is the expected SBP for the “Excellent” category of hlthstat.

The remaining coefficients are the differences between the expected SBP in the “Excellent” group and the other groups. For example, the expected SBP in the “Poor” group is 16.42304 mmHg higher than the “Excellent” group.

We could estimate the expected SBP in the “Poor” group by adding the coefficient for “_cons” and the coefficient for “Poor”.

. display "E(SBP | hlthstat=Poor) = "  124.3191 + 16.42304
E(SBP | hlthstat=Poor) = 140.74214

We could do the same calculation for the other groups:

. display "E(SBP | hlthstat=Very good) = "  124.3191 + 2.981587
E(SBP | hlthstat=Very good) = 127.30069

. display "E(SBP | hlthstat=Good) = "  124.3191 + 8.034913
E(SBP | hlthstat=Good) = 132.35401

. display "E(SBP | hlthstat=Fair) = "  124.3191 + 14.71925
E(SBP | hlthstat=Fair) = 139.03835

Stata's margins command will estimate the expected SBP for each group. Note that the “i.” prefix is required in the regress command but not in the margins command.

. margins hlthstat

Adjusted predictions                                    Number of obs = 10,335
Model VCE: OLS

Expression: Linear prediction, predict()



                          Delta-method                                        
                   Margin   std. err.      t    P>|t|     [95% conf. interval]
   
    hlthstat                                                                  
  Excellent      124.3191   .4618951   269.15   0.000     123.4137    125.2245
  Very good      127.3007   .4451924   285.95   0.000      126.428    128.1733
       Good       132.354   .4180763   316.58   0.000     131.5345    133.1735
       Fair      139.0383   .5545276   250.73   0.000     137.9513    140.1253
       Poor      140.7421   .8393008   167.69   0.000     139.0969    142.3873

The output also reports a standard error, t statistic, p-value, and 95% confidence interval for each estimate. The t statistic tests the null hypothesis that the expected SBP is zero.

We can plot the marginal predictions and their 95% confidence intervals by typing marginsplot.

. marginsplot

Variables that uniquely identify margins: hlthstat

By default, marginsplot creates a profile plot using lines. We can use the recast(bar) option if we prefer a bar chart, or “dynamite plunger plot”.

. marginsplot, recast(bar)

Variables that uniquely identify margins: hlthstat

We can add the horizontal option to create a horizontal bar chart.

. marginsplot, recast(bar) horizontal

Variables that uniquely identify margins: hlthstat

Let's add some additional options to make our graph look nicer. We can use the plotopts(barwidth(0.8)) option to add some space between the bars. And we can use the title(), subtitle(), xtitle(), and ytitle() options to add various titles to our graph.

. marginsplot, recast(bar) horizontal plotopts(barwidth(0.8)) 
               title("Expected systolic blood pressure (mmHg)") 
               subtitle("By health status") 
               xtitle("Expected systolic blood pressure (mmHg)")

Variables that uniquely identify margins: hlthstat

You can read more about factor-variable notation, margins, and marginsplot in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.

See it in action

Watch Introduction to margins in Stata, part 1: Categorical variables.

Watch Stata Quick Tip: Margins.

Watch Profile plots and interaction plots in Stata: A single categorical variable.

Tell me more

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Variable		Obs Mean Std. dev. Min Max

bpsystol		10,351 130.8817 23.33265 65 300
hlthstat		10,335 2.586164 1.206196 1 5
diabetes		10,349 .0482172 .2142353 0 1
age		10,351 47.57965 17.21483 20 74
bmi		10,351 25.5376 4.914969 12.3856 61.1297

Source	SS df MS	Number of obs = 10,335
		F(4, 10330) = 158.34
Model	325244.686 4 81311.1715	Prob > F = 0.0000
Residual	5304728.67 10,330 513.526492	R-squared = 0.0578
		Adj R-squared = 0.0574
Total	5629973.35 10,334 544.800982	Root MSE = 22.661


bpsystol		Coefficient Std. err. t P>\|t\| [95% conf. interval]

hlthstat
Very good		2.981587 .6415165 4.65 0.000 1.72409 4.239083
Good		8.034913 .6230047 12.90 0.000 6.813703 9.256123
Fair		14.71925 .721698 20.40 0.000 13.30459 16.13392
Poor		16.42304 .9580047 17.14 0.000 14.54517 18.30092

_cons		124.3191 .4618951 269.15 0.000 123.4137 125.2245


		Delta-method
		Margin std. err. t P>\|t\| [95% conf. interval]

hlthstat
Excellent		124.3191 .4618951 269.15 0.000 123.4137 125.2245
Very good		127.3007 .4451924 285.95 0.000 126.428 128.1733
Good		132.354 .4180763 316.58 0.000 131.5345 133.1735
Fair		139.0383 .5545276 250.73 0.000 137.9513 140.1253
Poor		140.7421 .8393008 167.69 0.000 139.0969 142.3873