Home  /  Products  /  Stata 19  /  Causal mediation analysis with two mediators

← See Stata 19's new features

Highlights

  • Causal mediation with sequential or parallel mediators

  • Finest possible decomposition into path-specific effects

  • Coarser decomposition and mediator-specific effects

  • Direct effects, indirect effects, total effect, potential-outcome means, and controlled direct effects

  • Support for several types of interaction effects

  • Binary and continuous treatments

  • Sequential mediation sensitivity analysis

  • See more causal inference features

The mediate command now allows you to perform causal mediation analysis with two mediators, whether they are sequential or parallel. Estimate natural indirect effects, natural direct effects, and controlled direct effects. Obtain mediator-specific effects. Perform sensitivity analysis with sequential mediators. This feature is a part of StataNow™.

Causal mediation analysis goes beyond estimating a causal effect to exploring how the effect may arise. Is the effect of a treatment on an outcome mediated through other variables?

In Stata 18, we introduced the mediate command for fitting causal mediation models with one mediator. Now, mediate can also fit causal mediation models with two mediators. These mediators can be causally ordered (sequential) or not causally ordered (parallel). In either case, you can estimate the total effect and see how it can be decomposed into direct and indirect effects.

Let's see it work

Parallel mediation

Suppose we wish to find out whether arthritis in humans affects their health-related quality of life. To the extent that it does, we also wish to disentangle the causal pathways through which this effect operates. In particular, we are interested in determining whether physical activity and depression mediate this effect. Does arthritis decrease physical activity, which in turn decreases quality of life? Does arthritis increase severity of depression, which in turn decreases quality of life?

To evaluate these potential mediating effects, We use fictional data including the health-related quality-of-life outcome variable (hrqol), mediator variables physical activity (physact) and depression (depress), and the binary treatment variable arthritis (arthritis). Ignoring potentially confounding variables, we can draw a causal diagram to represent our basic model:

There are three causal pathways of interest: a direct pathway from arthritis to hrqol, an indirect path from arthritis to hrqol via physact, and an indirect path from arthritis to hrqol via depress. Our goal is to decompose the total causal effect of arthritis on hrqol into these three path-specific components. Because there is no causal path that goes through both mediators, we refer to these mediators as parallel.

Starting with a simple model that does not include covariates or any interaction effects, we specify the following causal mediation model:

. mediate (hrqol) (depress) (physact) (arthritis)

Iteration 0:  EE criterion = 1.537e-27  
Iteration 1:  EE criterion = 8.303e-30  

Causal mediation analysis                                Number of obs = 2,000

Mediation type: Parallel
Mediator 1:     physact
Mediator 2:     depress
Treatment type: Binary
Robust
hrqol Coefficient std. err. z P>|z| [95% conf. interval]
NDE
arthritis
(Yes vs No) -1.71165 .1496673 -11.44 0.000 -2.004993 -1.418308
NIE1
arthritis
(Yes vs No) -1.814969 .1817366 -9.99 0.000 -2.171166 -1.458772
NIE2
arthritis
(Yes vs No) -1.252368 .2507925 -4.99 0.000 -1.743912 -.7608239
TE
arthritis
(Yes vs No) -4.778988 .2912224 -16.41 0.000 -5.349773 -4.208202

The estimated total effect (TE) is −4.8 points on our (arbitrary) health-related quality-of-life scale. This effect is interpreted just like an average treatment effect: if every individual in the population had arthritis, the population average health-related quality of life score would be 5 points lower than if no one in the population had arthritis. In other words, arthritis causes a 5-point decrease in health-related quality of life, on average. The remaining estimates are the natural indirect effects (NIEs) and natural direct effect (NDE). They tell us how much of this 5-point difference is due to physical activity (NIE1), how much to depression (NIE2), and how much to neither physical activity nor depressive symptoms (NDE). The NIE1 estimate is −1.8, so almost two points of the 5-point difference are due to decreased physical activity. The indirect effect of arthritis on hrqol through depress (NIE2) is −1.3. Finally, the estimated NDE indicates that 1.7 points of the 5-point difference are due to mechanisms other than physical activity and depression.

Including interactions

The model above is somewhat restrictive in the sense that we apply strictly linear equations without interactions among treatment and mediators. In the single-mediator case, including a treatment–mediator interaction yields two sets of effect estimands, pure and total natural effects, resulting in two different decompositions of the total effect. Using two mediators with a parallel mediation design, however, yields a total of six decompositions and can yield as many as four estimands per effect of interest.

In the following model, we include interactions as well as covariates. We suspect that age is a confounder of the treatment–outcome and the mediator–outcome relationships, so we include age in the model for hrqol. To allow the effect of age on the outcome to vary as a function of the two mediators, we also include interaction terms by using factor-variable notation (c.depress#c.age c.physact#c.age). Finally, to allow for treatment–mediator and mediator–mediator interactions, we specify the tinteraction and minteraction options.

. mediate (hrqol age c.depress#c.age c.physact#c.age)
           (depress i.male)
	   (physact i.male)
	   (arthritis), tinteraction minteraction

Iteration 0:  EE criterion = 1.762e-25
Iteration 1:  EE criterion = 8.149e-30

Causal mediation analysis                                Number of obs = 2,000

Mediation type: Parallel
Mediator 1:     physact
Mediator 2:     depress
Treatment type: Binary
Robust
hrqol Coefficient std. err. z P>|z| [95% conf. interval]
NDE
00 -2.93327 .0986156 -29.74 0.000 -3.126553 -2.739987
10 -2.2664 .1321111 -17.16 0.000 -2.525333 -2.007467
01 -3.13574 .1548724 -20.25 0.000 -3.439284 -2.832196
11 -2.441183 .0633247 -38.55 0.000 -2.565297 -2.317068
NIE1
00 -2.559244 .2330217 -10.98 0.000 -3.015958 -2.102529
10 -1.892373 .1842448 -10.27 0.000 -2.253486 -1.53126
01 -2.596165 .2396821 -10.83 0.000 -3.065933 -2.126396
11 -1.901607 .1889659 -10.06 0.000 -2.271974 -1.531241
NIE2
00 -.4994956 .0965284 -5.17 0.000 -.6886879 -.3103033
10 -.7019653 .1137533 -6.17 0.000 -.9249177 -.4790128
01 -.5364167 .0960709 -5.58 0.000 -.7247123 -.3481211
11 -.7111994 .1138654 -6.25 0.000 -.9343716 -.4880273
TE
arthritis
(Yes vs No) -5.536843 .2526692 -21.91 0.000 -6.032065 -5.04162
Note: Outcome equation includes treatment–mediator interactions and mediator–mediator interaction.

We now have four NDEs and four of each NIE! The effects are now referred to by an index, for example, NDE-00 or NIE1-11. Each of the effects is defined as the difference in two potential-outcome means, and these indices correspond to the counterfactual treatment levels for the potential outcomes. For example, the estimated NDE labeled 00 is −2.9 and is the natural direct effect when both of the mediators are at their values associated with no treatment (arthritis = 0). The NDEs range from −2.3 to −3.1, while the NIEs via physact range from −1.9 to −2.6, and the NIEs via depress range from −0.5 to −0.7. How much variation there is among each type of effect depends on the size of the interaction effects. In this case, while we do see variation, we still find that the largest effects are those mediated via physact (NIE1) and the direct effects (NDE).

Sequential mediation

In the two examples above, our causal model was specified such that there was no causal relation between the two mediators. Continuing with the quality-of-life data, we could instead hypothesize that physical activity causes a reduction in depression. Accounting for this causal ordering, we can draw a causal diagram that contains the additional path from physact to depress:

Because we have a causal path from one mediator to the other, we refer to this model as a sequential mediation model. Now there are four path-specific effects of interest: one mediated through physact alone, one mediated through depress alone, one mediated through both physact and depress, and one not mediated by either physact or depress. Let's first fit a simple variation of this model model with no interaction terms. We specify the sequential option to tell mediate that we wish to fit a sequential model. We indicate the causal order of the mediators by specifying the causal sequence from right (treatment) to left (outcome). That is, to specify that physact affects depress, we type

. mediate (hrqol age) (depress i.male) (physact i.male) (arthritis), sequential

If we wanted the causal order reversed, we would specify

. mediate (hrqol age) (physact i.male) (depress i.male) (arthritis), sequential

As before, this model is rather restrictive because it does not include any interaction terms. To include treatment–mediator interactions in the outcome equation, we can specify the tinteraction option:

. mediate (hrqol age) (depress i.male) (physact i.male) (arthritis), sequential tinteraction

Estimating all path-specific effects in the presence of interactions results in a relatively large number of estimands that can be reduced by performing a coarser decomposition, which we could do by grouping some of the path-specific effects together. For example, suppose we were primarily interested in the indirect effect via physact. In that case, we could group together any indirect effects that go through depress, which in this case would be the paths arthritis \(\rightarrow\) depress \(\rightarrow\) hrqol and arthritis \(\rightarrow\) physact \(\rightarrow\) depress \(\rightarrow\) hrqol, and focus only on the indirect effect via physact alone. We can estimate these mediator-specific effects by specifying the mseffects(physact) option:

. mediate (hrqol age) (depress i.male) (physact i.male) (arthritis), sequential tinteraction mseffects(physact)

Controlled direct effects

A practitioner might want to perform an intervention study that targets one of the causal mechanisms with the intent to improve quality of life for people with arthritis. To this end, we might ask, If we could increase the amount of individuals' physical activity, could we shrink the gap in quality of life between people with and without arthritis? To answer this sort of question, we can estimate controlled direct effects (CDEs), which capture the direct effect of the treatment on the outcome when mediators are fixed at certain population values of interest. We start by evaluating the direct effect at the means of our mediators physact and depress:

. estat cde, mvalue( (mean) physact depress)

Controlled direct effect                                 Number of obs = 2,000

Mediator variables: physact depress
Mediator values:
  physact =     2.247 (mean)
  depress = 6.2440063 (mean)

Delta-method
CDE std. err. z P>|z| [95% conf. interval]
arthritis
(Yes vs No) -2.603496 .0609992 -42.68 0.000 -2.723052 -2.48394

The estimated CDE is −2.6, which suggests that if everyone in the population would be held at the average value of the mediators, a decrease of 2.6 in quality of life is directly due to arthritis. What would happen if we could increase the amount of physical activity to five or even eight hours per month? To answer this question, we keep mediator depress at its mean and evaluate the CDE at values of five and eight hours of physical activity:

. estat cde, mvalue( (mean) depress physact=(5 8) )

Controlled direct effect                                 Number of obs = 2,000

Mediator variables: physact depress
Mediator values:
  1._at: physact =         5
         depress = 6.2440063 (mean)
  2._at: physact =         8
         depress = 6.2440063 (mean)

Delta-method
CDE std. err. z P>|z| [95% conf. interval]
arthritis@
_at
(Yes vs No)
1 -2.73777 .1385674 -19.76 0.000 -3.009357 -2.466183
(Yes vs No)
2 -2.884092 .2578233 -11.19 0.000 -3.389416 -2.378767

The results show that if everyone in the population were to have 5 hours of physical activity (and an average depression score), the difference in quality of life due to arthritis would still be around −2.7 and not much different from the above result where physact was held at its average of 2.247. Similarly, for 8 hours of physical activity, the CDE is −2.9. Using the contrast option, we can obtain a difference between the two as well as a test:

. estat cde, mvalue( (mean) depress physact=(5 8) ) contrast

Controlled direct effect                                 Number of obs = 2,000

Mediator variables: physact depress
Mediator values:
  1._at: physact =         5
         depress = 6.2440063 (mean)
  2._at: physact =         8
         depress = 6.2440063 (mean)

Delta-method
CDE std. err. z P>|z| [95% conf. interval]
_at#
arthritis
(2 vs 1)
(Yes vs No) -.1463215 .1259133 -1.16 0.245 -.3931072 .1004641

The difference is close to zero, which suggests that physical activity increases quality of life but does not increase it more for people with arthritis. In other words, the intervention could be helpful for all individuals but would not close the gap in quality of life between people with and without arthritis.

Tell me more

Read more about causal mediation analysis with two mediators in [CAUSAL] mediation multiple in the Stata Causal Inference and Treatment-Effects Estimation Reference Manual.

Learn more about Stata's causal inference features.

View all the new features in Stata 19 and, in particular, new in causal inference.

Ready to get started?

Experience powerful statistical tools, reproducible workflows, and a seamless user experience—all in one trusted platform.