 »  Home »  Resources & support »  FAQs »  Compute standard errors with margins

## How are average marginal effects and their standard errors computed by margins using the delta method?

 Title Compute standard errors with margins Author Jeff Pitblado, StataCorp

In the following, I use the nofvlabel option so that the output aligns with the expressions I use. nofvlabel is a display option that is common to margins and estimation commands. This option was introduced in Stata 13, where we now show the value labels for factor variables by default.

### Introduction

Here is an example using logit:

. webuse margex
(Artificial data for margins)

. logit outcome i.treatment distance, nofvlabel

Iteration 0:   log likelihood = -1366.0718
Iteration 1:   log likelihood = -1257.5623
Iteration 2:   log likelihood = -1244.2136
Iteration 3:   log likelihood = -1242.8796
Iteration 4:   log likelihood = -1242.8245
Iteration 5:   log likelihood = -1242.8245

Logistic regression                               Number of obs   =       3000
LR chi2(2)      =     246.49
Prob > chi2     =     0.0000
Log likelihood = -1242.8245                       Pseudo R2       =     0.0902

------------------------------------------------------------------------------
outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.treatment |    1.42776    .113082    12.63   0.000     1.206124    1.649397
distance |  -.0047747   .0011051    -4.32   0.000    -.0069406   -.0026088
_cons |  -2.337762   .0962406   -24.29   0.000     -2.52639   -2.149134
------------------------------------------------------------------------------


I will show how margins computes standard errors (SEs) of average marginal effects (AMEs). The remaining discussion has two parts. The first part describes how to compute AMEs and their SE estimates for factor variables; the second part concerns continuous variables.

### Factor variables

Because the AME of a two-level factor variable is just the difference between the two predictive margins, we start by computing the SEs for predictive margins. Here we use margins to compute the predictive margins for the two levels of treatment:

. margins treatment, nofvlabel

Predictive margins                                Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
treatment |
0  |   .0791146   .0069456    11.39   0.000     .0655016    .0927277
1  |   .2600204   .0111772    23.26   0.000     .2381135    .2819272
------------------------------------------------------------------------------


We make copies of two matrices from the margins's stored results to compare later. r(Jacobian) is the Jacobian matrix, which will be explained later. r(V) is the estimated variance matrix that corresponds with the reported predictive margins. The square root of the diagonal elements are reported in the above column labeled “Delta-method Std. Err.”.

. matrix rJ = r(Jacobian)

. matrix rV = r(V)

. display sqrt(rV[1,1])
.00694557

. display sqrt(rV[2,2])
.01117718


The delta method allows us to obtain the appropriate SEs of any smooth function of the fitted model parameters. It basically involves applying a Jacobian matrix to the estimated variance matrix of the fitted model parameters. The Jacobian is a matrix of partial derivatives of the statistics of interest with respect to each of the fitted model parameters.

Before we start taking derivatives, let's see how the predictive margins are essentially computed. This will provide us with the formulas from which we will work out the derivatives that go into the Jacobian matrix. In the following, p0 is the variable used to re-create the predictive margin for 0.treatment, and p1 corresponds to 1.treatment. The means of these variables are the predictive margins.

. generate p0 = invlogit(_b[0b.treatment] + _b[distance]*distance + _b[_cons])

. generate p1 = invlogit(_b[1.treatment] + _b[distance]*distance + _b[_cons])

. sum p0 p1

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
p0 |      3000    .0791146    .0212555   .0026954   .0879477
p1 |      3000    .2600204    .0688961    .011143   .2867554


Now we will make the calculations necessary to reproduce the Jacobian matrix, which we will then use to reproduce the estimated variance matrix. The rows identify what we are taking the partial derivative of; the columns identify what we are taking the partial derivative with respect to. Thus the rows correspond with the predictive margins, and the columns correspond with the fitted model parameters in the logistic regression.

. generate dp0dxb = p0*(1-p0)

. generate dp1dxb = p1*(1-p1)

. generate zero = 0

. generate one = 1

. matrix vecaccum J0 = dp0dxb zero zero distance

. matrix vecaccum J1 = dp1dxb zero one  distance

. matrix Jac = J0/e(N) \ J1/e(N)

. matrix V = Jac*e(V)*Jac'

. matrix list Jac

Jac[2,4]
zero       zero   distance      _cons
dp0dxb          0          0  .74390659  .07240388
dp1dxb          0  .18766468  2.1907626  .18766468

. matrix list rJ

rJ[2,4]
outcome:   outcome:   outcome:   outcome:
0b.         1.
treatment  treatment   distance      _cons
0.treatment          0          0  .74390659  .07240388
1.treatment          0  .18766468  2.1907626  .18766468

. matrix list V

symmetric V[2,2]
dp0dxb      dp1dxb
dp0dxb   .00004824
dp1dxb  -1.181e-07   .00012493

. matrix list rV

symmetric rV[2,2]
0.          1.
treatment   treatment
0.treatment   .00004824
1.treatment  -1.181e-07   .00012493


Now let's compute the marginal effect of treatment. margins keeps a placeholder for the base outcome of treatment; the Jacobian and variance elements are merely set to 0.

. margins, dydx(treatment) nofvlabel

Average marginal effects                          Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()
dy/dx w.r.t. : 1.treatment

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.treatment |   .1809057   .0131684    13.74   0.000     .1550961    .2067153
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. matrix list r(Jacobian)

r(Jacobian)[2,4]
outcome:   outcome:   outcome:   outcome:
0b.         1.
treatment  treatment   distance      _cons
0b.treatment          0          0          0          0
1.treatment          0  .18766468   1.446856  .11526081

. matrix list r(V)

symmetric r(V)[2,2]
0b.         1.
treatment  treatment
0b.treatment          0
1.treatment          0  .00017341


The marginal effect of treatment is just the difference between the predictive margins. Here we take advantage of the fact that the difference of the means is the mean of the differences.

. generate pdiff = p1 - p0

. sum pdiff

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
pdiff |      3000    .1809057    .0476499   .0084476   .1988077


The Jacobian is similarly composed from the previous calculations. We compose it ignoring the base level of treatment. Notice that the elements of Jdiff match those of the second row of r(Jacobian) above.

. matrix Jdiff = (-1, 1)*Jac

. matrix list Jdiff

Jdiff[1,4]
zero       zero   distance      _cons
r1          0  .18766468   1.446856  .11526081

. matrix Vdiff = Jdiff*e(V)*Jdiff'

. matrix list Vdiff

symmetric Vdiff[1,1]
r1
r1  .00017341


### Continuous variables

The marginal effect of a continuous variable is essentially computed the same way, but there are more derivatives involved. Here we have margins compute the AME of distance and save copies of r(Jacobian) and r(V).

. margins, dydx(distance)

Average marginal effects                          Number of obs   =       3000
Model VCE    : OIM

Expression   : Pr(outcome), predict()
dy/dx w.r.t. : distance

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
distance |  -.0006228   .0001444    -4.31   0.000    -.0009058   -.0003399
------------------------------------------------------------------------------

. matrix rJ = r(Jacobian)

. matrix rV = r(V)

. display sqrt(rV[1,1])
.00014436


The AME of distance is the average of the derivative of the prediction with respect to distance.

. predict p, pr

. generate dpdxb = p*(1-p)

. generate dpdx = dpdxb*_b[distance]

. sum dpdx

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
dpdx |      3000   -.0006228    .0003231  -.0009766  -.0000128


For the Jacobian, we must compute the partials of dpdx with respect to the model parameters. The partial with respect to the coefficient for distance has an extra piece.

. generate d2pdxb2 = p*(1-p)*(1-p) - p*p*(1-p)

. matrix vecaccum Jac = d2pdxb2 0b.treatment 1.treatment distance

. matrix Jac = Jac*_b[distance]/e(N)

. sum dpdxb

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
dpdxb |      3000    .1304451    .0676672   .0026901   .2045267

. matrix Jac[1,3] = Jac[1,3] + r(mean)

. matrix V = Jac*e(V)*Jac'

. matrix list Jac

Jac[1,4]
0b.          1.
treatment   treatment    distance       _cons
d2pdxb2           0  -.00020228    .1256217  -.00034567

. matrix list rJ

rJ[1,4]
outcome:    outcome:    outcome:    outcome:
0b.          1.
treatment   treatment    distance       _cons
distance           0  -.00020228    .1256217  -.00034567

. matrix list V

symmetric V[1,1]
d2pdxb2
d2pdxb2  2.084e-08

. matrix list rV

symmetric rV[1,1]
distance
distance  2.084e-08


### Conclusion

In summary, I have shown how to compute discrete and continuous marginal effects along with their corresponding SE estimates using the delta method. This is essentially what margins does in all cases, except that it uses numerical derivatives for all but the linear prediction.