Home  /  Resources & support  /  FAQs  /  Compute standard errors with margins

How are average marginal effects and their standard errors computed by margins using the delta method?

Title   Compute standard errors with margins
Author Jeff Pitblado, StataCorp

In the following, I use the nofvlabel option so that the output aligns with the expressions I use. nofvlabel is a display option that is common to margins and estimation commands. This option was introduced in Stata 13, where we now show the value labels for factor variables by default.

Introduction

Here is an example using logit:

. webuse margex
(Artificial data for margins)

. logit outcome i.treatment distance, nofvlabel

Iteration 0:   log likelihood = -1366.0718  
Iteration 1:   log likelihood = -1257.5623  
Iteration 2:   log likelihood = -1244.2136  
Iteration 3:   log likelihood = -1242.8796  
Iteration 4:   log likelihood = -1242.8245  
Iteration 5:   log likelihood = -1242.8245  

Logistic regression                                     Number of obs =  3,000
                                                        LR chi2(2)    = 246.49
                                                        Prob > chi2   = 0.0000
Log likelihood = -1242.8245                             Pseudo R2     = 0.0902

outcome Coefficient Std. err. z P>|z| [95% conf. interval]
1.treatment 1.42776 .113082 12.63 0.000 1.206124 1.649397
distance -.0047747 .0011051 -4.32 0.000 -.0069406 -.0026088
_cons -2.337762 .0962406 -24.29 0.000 -2.52639 -2.149134

I will show how margins computes standard errors (SEs) of average marginal effects (AMEs). The remaining discussion has two parts. The first part describes how to compute AMEs and their SE estimates for factor variables; the second part concerns continuous variables.

Factor variables

Because the AME of a two-level factor variable is just the difference between the two predictive margins, we start by computing the SEs for predictive margins. Here we use margins to compute the predictive margins for the two levels of treatment:

. margins treatment, nofvlabel

Predictive margins                                       Number of obs = 3,000
Model VCE: OIM

Expression: Pr(outcome), predict()

Delta-method
Margin std. err. z P>|z| [95% conf. interval]
treatment
0 .0791146 .0069456 11.39 0.000 .0655016 .0927277
1 .2600204 .0111772 23.26 0.000 .2381135 .2819272

We make copies of two matrices from the margins's stored results to compare later. r(Jacobian) is the Jacobian matrix, which will be explained later. r(V) is the estimated variance matrix that corresponds with the reported predictive margins. The square root of the diagonal elements are reported in the above column labeled “Delta-method Std. Err.”.

. matrix rJ = r(Jacobian)

. matrix rV = r(V)

. display sqrt(rV[1,1])
.00694557

. display sqrt(rV[2,2])
.01117718

The delta method allows us to obtain the appropriate SEs of any smooth function of the fitted model parameters. It basically involves applying a Jacobian matrix to the estimated variance matrix of the fitted model parameters. The Jacobian is a matrix of partial derivatives of the statistics of interest with respect to each of the fitted model parameters.

Before we start taking derivatives, let's see how the predictive margins are essentially computed. This will provide us with the formulas from which we will work out the derivatives that go into the Jacobian matrix. In the following, p0 is the variable used to re-create the predictive margin for 0.treatment, and p1 corresponds to 1.treatment. The means of these variables are the predictive margins.

. generate p0 = invlogit(_b[0b.treatment] + _b[distance]*distance + _b[_cons])

. generate p1 = invlogit(_b[1.treatment] + _b[distance]*distance + _b[_cons])

. sum p0 p1

Variable Obs Mean Std. dev. Min Max
p0 3,000 .0791146 .0212555 .0026954 .0879477
p1 3,000 .2600204 .0688961 .011143 .2867554

Now we will make the calculations necessary to reproduce the Jacobian matrix, which we will then use to reproduce the estimated variance matrix. The rows identify what we are taking the partial derivative of; the columns identify what we are taking the partial derivative with respect to. Thus the rows correspond with the predictive margins, and the columns correspond with the fitted model parameters in the logistic regression.

. generate dp0dxb = p0*(1-p0)

. generate dp1dxb = p1*(1-p1)

. generate zero = 0

. generate one = 1

. matrix vecaccum J0 = dp0dxb zero zero distance

. matrix vecaccum J1 = dp1dxb zero one distance

. matrix Jac = J0/e(N) \ J1/e(N)

. matrix V = Jac*e(V)*Jac'

. matrix list Jac

Jac[2,4]
             zero       zero   distance      _cons
dp0dxb          0          0  .74390659  .07240388
dp1dxb          0  .18766468  2.1907626  .18766468

. matrix list rJ

rJ[2,4]
               outcome:   outcome:   outcome:   outcome:
                    0b.         1.                      
             treatment  treatment   distance      _cons
0.treatment          0          0  .74390659  .07240388
1.treatment          0  .18766468  2.1907626  .18766468

. matrix list V

symmetric V[2,2]
            dp0dxb      dp1dxb
dp0dxb   .00004824
dp1dxb  -1.181e-07   .00012493

. matrix list rV

symmetric rV[2,2]
                      0.          1.
              treatment   treatment
0.treatment   .00004824
1.treatment  -1.181e-07   .00012493

Now let's compute the marginal effect of treatment. margins keeps a placeholder for the base outcome of treatment; the Jacobian and variance elements are merely set to 0.

. margins, dydx(treatment) nofvlabel

Average marginal effects                                 Number of obs = 3,000
Model VCE: OIM

Expression: Pr(outcome), predict()
dy/dx wrt:  1.treatment

Delta-method
dy/dx std. err. z P>|z| [95% conf. interval]
1.treatment .1809057 .0131684 13.74 0.000 .1550961 .2067153
Note: dy/dx for factor levels is the discrete change from the base level. . matrix list r(Jacobian) r(Jacobian)[2,4] outcome: outcome: outcome: outcome: 0b. 1. treatment treatment distance _cons 0b.treatment 0 0 0 0 1.treatment 0 .18766468 1.446856 .11526081 . matrix list r(V) symmetric r(V)[2,2] 0b. 1. treatment treatment 0b.treatment 0 1.treatment 0 .00017341

The marginal effect of treatment is just the difference between the predictive margins. Here we take advantage of the fact that the difference of the means is the mean of the differences.

. generate pdiff = p1 - p0

. sum pdiff

Variable Obs Mean Std. dev. Min Max
pdiff 3,000 .1809057 .0476499 .0084476 .1988077

The Jacobian is similarly composed from the previous calculations. We compose it ignoring the base level of treatment. Notice that the elements of Jdiff match those of the second row of r(Jacobian) above.

. matrix Jdiff = (-1, 1)*Jac

. matrix list Jdiff

Jdiff[1,4]
         zero       zero   distance      _cons
r1          0  .18766468   1.446856  .11526081

. matrix Vdiff = Jdiff*e(V)*Jdiff'

. matrix list Vdiff

symmetric Vdiff[1,1]
           r1
r1  .00017341

Continuous variables

The marginal effect of a continuous variable is essentially computed the same way, but there are more derivatives involved. Here we have margins compute the AME of distance and save copies of r(Jacobian) and r(V).

. margins, dydx(distance)

Average marginal effects                                 Number of obs = 3,000
Model VCE: OIM

Expression: Pr(outcome), predict()
dy/dx wrt:  distance

Delta-method
dy/dx std. err. z P>|z| [95% conf. interval]
distance -.0006228 .0001444 -4.31 0.000 -.0009058 -.0003399
. matrix rJ = r(Jacobian) . matrix rV = r(V) . display sqrt(rV[1,1]) .00014436

The AME of distance is the average of the derivative of the prediction with respect to distance.

 
. predict p, pr

. generate dpdxb = p*(1-p)

. generate dpdx = dpdxb*_b[distance]

. sum dpdx

Variable Obs Mean Std. dev. Min Max
dpdx 3,000 -.0006228 .0003231 -.0009766 -.0000128

For the Jacobian, we must compute the partials of dpdx with respect to the model parameters. The partial with respect to the coefficient for distance has an extra piece.

. generate d2pdxb2 = p*(1-p)*(1-p) - p*p*(1-p)

. matrix vecaccum Jac = d2pdxb2 0b.treatment 1.treatment distance

. matrix Jac = Jac*_b[distance]/e(N)

. sum dpdxb

Variable Obs Mean Std. dev. Min Max
dpdxb 3,000 .1304451 .0676672 .0026901 .2045267
. matrix Jac[1,3] = Jac[1,3] + r(mean) . matrix V = Jac*e(V)*Jac' . matrix list Jac Jac[1,4] 0b. 1. treatment treatment distance _cons d2pdxb2 0 -.00020228 .1256217 -.00034567 . matrix list rJ rJ[1,4] outcome: outcome: outcome: outcome: 0b. 1. treatment treatment distance _cons distance 0 -.00020228 .1256217 -.00034567 . matrix list V symmetric V[1,1] d2pdxb2 d2pdxb2 2.084e-08 .matrix list rV symmetric rV[1,1] distance distance 2.084e-08

Conclusion

In summary, I have shown how to compute discrete and continuous marginal effects along with their corresponding SE estimates using the delta method. This is essentially what margins does in all cases, except that it often uses numerical derivatives.