Note: Based on a Statalist post by Jeff Pitblado:
http://www.statalist.org/forums/forum/general-stata-discussion/general/98078-delta-method-standard-errors-for-average-marginal-effects-using-margins-command.

Title | Compute standard errors with margins | |

Author | Jeff Pitblado, StataCorp |

In the following, I use the **nofvlabel** option so that the
output aligns with the expressions I use. **nofvlabel** is a display option
that is common to **margins** and
estimation commands. This option was
introduced in Stata 13, where we now show the value labels for factor
variables by default.

Here is an example using **logit**:

.webuse margex(Artificial data for margins) .logit outcome i.treatment distance, nofvlabelIteration 0: log likelihood = -1366.0718 Iteration 1: log likelihood = -1257.5623 Iteration 2: log likelihood = -1244.2136 Iteration 3: log likelihood = -1242.8796 Iteration 4: log likelihood = -1242.8245 Iteration 5: log likelihood = -1242.8245 Logistic regression Number of obs = 3,000 LR chi2(2) = 246.49 Prob > chi2 = 0.0000 Log likelihood = -1242.8245 Pseudo R2 = 0.0902

outcome | Coefficient Std. err. z P>|z| [95% conf. interval] | |

1.treatment | 1.42776 .113082 12.63 0.000 1.206124 1.649397 | |

distance | -.0047747 .0011051 -4.32 0.000 -.0069406 -.0026088 | |

_cons | -2.337762 .0962406 -24.29 0.000 -2.52639 -2.149134 | |

I will show how **margins** computes standard errors (SEs) of average
marginal effects (AMEs). The remaining discussion has two parts. The first
part describes how to compute AMEs and their SE estimates for factor variables; the
second part concerns continuous variables.

Because the AME of a two-level factor variable is just the
difference between the two predictive margins, we start by computing the
SEs for predictive margins. Here we use **margins** to compute
the predictive margins for the two levels of **treatment**:

.margins treatment, nofvlabelPredictive margins Number of obs = 3,000 Model VCE: OIM Expression: Pr(outcome), predict()

Delta-method | ||

Margin std. err. z P>|z| [95% conf. interval] | ||

treatment | ||

0 | .0791146 .0069456 11.39 0.000 .0655016 .0927277 | |

1 | .2600204 .0111772 23.26 0.000 .2381135 .2819272 | |

We make copies of two matrices from the **margins**'s stored results to compare
later. **r(Jacobian)** is the Jacobian matrix, which will be explained later.
**r(V)** is the estimated variance matrix that corresponds with the
reported predictive margins. The square root of the diagonal elements are
reported in the above column labeled “Delta-method Std. Err.”.

.matrix rJ = r(Jacobian).matrix rV = r(V).display sqrt(rV[1,1]).00694557 .display sqrt(rV[2,2]).01117718

The delta method allows us to obtain the appropriate SEs of any smooth function of the fitted model parameters. It basically involves applying a Jacobian matrix to the estimated variance matrix of the fitted model parameters. The Jacobian is a matrix of partial derivatives of the statistics of interest with respect to each of the fitted model parameters.

Before we start taking derivatives, let's see how the predictive
margins are essentially computed. This will provide us with the formulas
from which we will work out the derivatives that go into the Jacobian
matrix. In the following, **p0** is the variable used to re-create the
predictive margin for **0.treatment**, and **p1** corresponds to
**1.treatment**. The means of these variables are the predictive margins.

.generate p0 = invlogit(_b[0b.treatment] + _b[distance]*distance + _b[_cons]).generate p1 = invlogit(_b[1.treatment] + _b[distance]*distance + _b[_cons]).sum p0 p1

Variable | Obs Mean Std. dev. Min Max | |

p0 | 3,000 .0791146 .0212555 .0026954 .0879477 | |

p1 | 3,000 .2600204 .0688961 .011143 .2867554 |

Now we will make the calculations necessary to reproduce the Jacobian matrix, which we will then use to reproduce the estimated variance matrix. The rows identify what we are taking the partial derivative of; the columns identify what we are taking the partial derivative with respect to. Thus the rows correspond with the predictive margins, and the columns correspond with the fitted model parameters in the logistic regression.

.generate dp0dxb = p0*(1-p0).generate dp1dxb = p1*(1-p1).generate zero = 0.generate one = 1.matrix vecaccum J0 = dp0dxb zero zero distance.matrix vecaccum J1 = dp1dxb zero one distance.matrix Jac = J0/e(N) \ J1/e(N).matrix V = Jac*e(V)*Jac'.matrix list JacJac[2,4] zero zero distance _cons dp0dxb 0 0 .74390659 .07240388 dp1dxb 0 .18766468 2.1907626 .18766468 .matrix list rJrJ[2,4] outcome: outcome: outcome: outcome: 0b. 1. treatment treatment distance _cons 0.treatment 0 0 .74390659 .07240388 1.treatment 0 .18766468 2.1907626 .18766468 .matrix list Vsymmetric V[2,2] dp0dxb dp1dxb dp0dxb .00004824 dp1dxb -1.181e-07 .00012493 .matrix list rVsymmetric rV[2,2] 0. 1. treatment treatment 0.treatment .00004824 1.treatment -1.181e-07 .00012493

Now let's compute the marginal effect of **treatment**. **margins**
keeps a placeholder for the base outcome of **treatment**; the Jacobian
and variance elements are merely set to 0.

.margins, dydx(treatment) nofvlabelAverage marginal effects Number of obs = 3,000 Model VCE: OIM Expression: Pr(outcome), predict() dy/dx wrt: 1.treatment

Delta-method | ||

dy/dx std. err. z P>|z| [95% conf. interval] | ||

1.treatment | .1809057 .0131684 13.74 0.000 .1550961 .2067153 | |

The marginal effect of **treatment** is just the difference between the
predictive margins. Here we take advantage of the fact that the difference of
the means is the mean of the differences.

.generate pdiff = p1 - p0.sum pdiff

Variable | Obs Mean Std. dev. Min Max | |

pdiff | 3,000 .1809057 .0476499 .0084476 .1988077 |

The Jacobian is similarly composed from the previous calculations. We compose it
ignoring the base level of **treatment**. Notice that the elements of
**Jdiff** match those of the second row of **r(Jacobian)** above.

.matrix Jdiff = (-1, 1)*Jac.matrix list JdiffJdiff[1,4] zero zero distance _cons r1 0 .18766468 1.446856 .11526081 .matrix Vdiff = Jdiff*e(V)*Jdiff'.matrix list Vdiffsymmetric Vdiff[1,1] r1 r1 .00017341

The marginal effect of a continuous variable is essentially computed the same
way, but there are more derivatives involved. Here we have **margins**
compute the AME of **distance** and save copies of
**r(Jacobian)** and **r(V)**.

.margins, dydx(distance)Average marginal effects Number of obs = 3,000 Model VCE: OIM Expression: Pr(outcome), predict() dy/dx wrt: distance

Delta-method | ||

dy/dx std. err. z P>|z| [95% conf. interval] | ||

distance | -.0006228 .0001444 -4.31 0.000 -.0009058 -.0003399 | |

The AME of **distance** is the average of the
derivative of the prediction with respect to **distance**.

.predict p, pr.generate dpdxb = p*(1-p).generate dpdx = dpdxb*_b[distance].sum dpdx

Variable | Obs Mean Std. dev. Min Max | |

dpdx | 3,000 -.0006228 .0003231 -.0009766 -.0000128 |

For the Jacobian, we must compute the partials of **dpdx** with respect to
the model parameters. The partial with respect to the coefficient for
**distance** has an extra piece.

.generate d2pdxb2 = p*(1-p)*(1-p) - p*p*(1-p).matrix vecaccum Jac = d2pdxb2 0b.treatment 1.treatment distance.matrix Jac = Jac*_b[distance]/e(N).sum dpdxb

Variable | Obs Mean Std. dev. Min Max | |

dpdxb | 3,000 .1304451 .0676672 .0026901 .2045267 |

In summary, I have shown how to compute discrete and continuous marginal
effects along with their corresponding SE estimates using the
delta method. This is essentially what **margins** does in all cases, except
that it often uses numerical derivatives.