Causal mediation analysis

Order

Watch video demo

<- See Stata 18's new features

Highlights

Linear and generalized linear models:
- Continuous, binary, and count outcomes
- Continuous, binary, and count mediators
- Binary, multivalued, and continuous treatments
- Linear, logit, probit, Poisson, and exponential-mean models for outcome and mediator
Direct effects, indirect effects, total effect, POMs, and controlled direct effects
Proportion mediated
Effects in odds-ratio, risk-ratio, and incidence-rate-ratio scale
Plots of effects
See more causal inference features

The new mediate command extends Stata's powerful causal-inference suite to support causal mediation analysis. Causal analysis identifies and quantifies causal effects. Causal mediation analysis disentangles them. Are these effects mediated through another variable, a mediator?

Choose one of 23 combinations of outcome and mediator models, including linear, logit, and Poisson, to estimate the total effect and decompose it into direct and indirect (through the mediator) effects. Compute controlled direct effects and proportion mediated. Recast effects into odds, risk, and incidence-rate ratios. Plot estimated effects. Obtain predictions. And more.

Let's see it work

We wish to find out whether physical exercise leads to an improvement in self-perceived well-being, and if so, to what extent. In addition, we wish to learn more about the mechanisms through which such causal effects operate. Perhaps exercising causes an increase in certain chemicals or hormones in the human body, which in turn affects perceptions of well-being. To answer questions like these, we use a causal mediation model to estimate the average treatment effect and decompose it into direct and indirect effects.

We have fictional data from a randomized controlled trial with individuals randomized into two groups—one group performs physical exercise, and the other group spends the same amount of time in a resting state. Subjective well-being is measured before and after treatment sessions. In addition, the level of the (fictional) hormone bonotonin is measured. The researchers wish to determine whether exercise leads to an increase in bonotonin levels, which in turn has a positive effect on subjective well-being.

We use a binary outcome variable (wellbeing) that indicates an increase in well-being of at least 10% compared with a baseline measurement. The mediator variable (bonotonin) is also binary and measures whether there was an increase of at least 10% in the production of this chemical.

We use a logit model for both outcome and mediator and specify the following causal mediation model:

. mediate (wellbeing, logit) (bonotonin, logit) (exercise)

Iteration 0:  EE criterion =  2.047e-17
Iteration 1:  EE criterion =  1.570e-32

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Logit
Mediator model:    Logit
Mediator variable: bonotonin
Treatment type:    Binary


                             Robust
   wellbeing   Coefficient  std. err.      z    P>|z|     [95% conf. interval]

NIE           
    exercise  
  (Exercise   
         vs   
   Control)      .1130778   .0287055     3.94   0.000      .056816    .1693397

NDE           
    exercise  
  (Exercise   
         vs   
   Control)      .1457939   .0357194     4.08   0.000     .0757851    .2158027

TE            
    exercise  
  (Exercise   
         vs   
   Control)      .2588717   .0213767    12.11   0.000     .2169742    .3007692

Note: Outcome equation includes treatment–mediator interaction.

The estimated total effect (TE) is 0.26. Because our outcome variable is binary, this effect is measured on the probability scale. We interpret this effect just like an average treatment effect: if every individual in the population would exercise, the probability of a higher well-being would increase by 0.26 points on the probability scale compared with if no one exercised.

The remaining results shown in the table are the estimates of the natural indirect effect (NIE) and natural direct effect (NDE). The NIE tells us to what extent exercise affects well-being through the production of bonotonin. The NDE captures the effect of exercise on well-being through mechanisms other than bonotonin. Here the NIE of 0.11 means that a 0.11 increase in the probability of higher well-being is due to the effect of exercise via the production of bonotonin. And based on the NDE, an increase of 0.15 is due to other mechanisms.

Above, we interpret the estimated effects on the probability scale as risk differences. If we wanted, we could also interpret them in terms of risk ratios or odds ratios. To get odds ratios, for instance, we use the postestimation command estat or:

. estat or
estat or requires potential-outcome means; refitting model ...

Transformed treatment effects                            Number of obs = 2,000


                             Robust
   wellbeing   Odds ratio   std. err.      z    P>|z|     [95% conf. interval]

NIE           
    exercise  
  (Exercise   
         vs   
   Control)      1.575108   .1827157     3.92   0.000     1.254785    1.977204

NDE           
    exercise  
  (Exercise   
         vs   
   Control)       1.87189   .2785201     4.21   0.000     1.398393    2.505713

TE            
    exercise  
  (Exercise   
         vs   
   Control)      2.948429   .2768771    11.51   0.000     2.452772    3.544249

The total effect corresponds to an odds ratio of 2.95, which in this case is the product of the direct- and indirect-effect odds ratios. To compute risk ratios, we could use estat rr, and had we fit a Poisson model for the outcome, we could have used estat irr to compute incidence-rate ratios.

We can also estimate controlled direct effects (CDE) using estat cde. For example, we could be interested in the direct effect under the counterfactual assumption that either every individual in the population experiences an increase in bonotonin levels or no one does. To do so, we specify estat cde with option mvalue(0 1) to estimate the average controlled direct effect of the treatment with the (binary) mediator bonotonin set to either 0 or 1:

. estat cde, mvalue(0 1)

Controlled direct effect                                 Number of obs = 2,000

Mediator variable: bonotonin
Mediator values:
  1._at: bonotonin = 0
  2._at: bonotonin = 1



                          Delta-method
                      CDE   std. err.      z    P>|z|     [95% conf. interval]

exercise@_at  
  (Exercise   
         vs   
   Control)   
          1      .1391299    .039573     3.52   0.000     .0615682    .2166916
  (Exercise   
         vs   
   Control)   
          2       .200756   .0505278     3.97   0.000     .1017234    .2997887

The direct effect of exercise on an increase in well-being is 0.14 on the probability scale if no one experiences an increase in bonotonin levels, and we can see that the effect is 0.2 if everyone in the population were to experience an increase in bonotonin levels.

Again, if we wanted to express these effects in terms of risk ratios or odds ratios, we can use options rr or or, respectively. Here we use option or to estimate controlled direct effects on the odds-ratio scale:

. estat cde, mvalue(0 1) or

Controlled direct effect                                 Number of obs = 2,000

Mediator variable: bonotonin
Mediator values:
  1._at: bonotonin = 0
  2._at: bonotonin = 1



                          Delta-method
               Odds ratio   std. err.      z    P>|z|     [95% conf. interval]

exercise@_at  
  (Exercise   
         vs   
   Control)   
          1      1.835771   .3049524     3.66   0.000     1.325621    2.542244
  (Exercise   
         vs   
   Control)   
          2      2.257759   .4785054     3.84   0.000     1.490306    3.420422

Finally, we might be interested in assigning a number to the amount of mediation. We can use estat proportion to compute the proportion mediated:

. estat proportion

Proportion mediated                                      Number of obs = 2,000


                             Robust
   wellbeing   Proportion   std. err.      z    P>|z|     [95% conf. interval]

    exercise  
  (Exercise   
         vs   
   Control)      .4368103   .1164639     3.75   0.000     .2085453    .6650752

The estimated proportion is based on the effect estimates on the probability scale. The indirect effect accounts for 44% of the total effect.

Tell me more

View all the new features in Stata 18.

Made for data science.

Get started today.

Order

Upgrade