FAQ: Compute standard errors with margins

Home / Resources & support / FAQs / Compute standard errors with margins

Note: Based on a Statalist post by Jeff Pitblado: http://www.statalist.org/forums/forum/general-stata-discussion/general/98078-delta-method-standard-errors-for-average-marginal-effects-using-margins-command.

How are average marginal effects and their standard errors computed by margins using the delta method?

Title		Compute standard errors with margins
Author		Jeff Pitblado, StataCorp

Introduction

Here is an example using logit:

. webuse margex
(Artificial data for margins)

. logit outcome i.treatment distance

Iteration 0:   log likelihood = -1366.0718  
Iteration 1:   log likelihood = -1257.5623  
Iteration 2:   log likelihood = -1244.2136  
Iteration 3:   log likelihood = -1242.8796  
Iteration 4:   log likelihood = -1242.8245  
Iteration 5:   log likelihood = -1242.8245  

Logistic regression                                     Number of obs =  3,000
                                                        LR chi2(2)    = 246.49
                                                        Prob > chi2   = 0.0000
Log likelihood = -1242.8245                             Pseudo R2     = 0.0902



     outcome   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
   
 1.treatment      1.42776    .113082    12.63   0.000     1.206124    1.649397
    distance    -.0047747   .0011051    -4.32   0.000    -.0069406   -.0026088
       _cons    -2.337762   .0962406   -24.29   0.000     -2.52639   -2.149134

I will show how margins computes standard errors (SEs) of average marginal effects (AMEs). The remaining discussion has two parts. The first part describes how to compute AMEs and their SE estimates for factor variables; the second part concerns continuous variables.

Factor variables

Because the AME of a two-level factor variable is just the difference between the two predictive margins, we start by computing the SEs for predictive margins. Here we use margins to compute the predictive margins for the two levels of treatment:

. margins treatment

Predictive margins                                       Number of obs = 3,000
Model VCE: OIM

Expression: Pr(outcome), predict()



                          Delta-method
                   Margin   std. err.      z    P>|z|     [95% conf. interval]
   
   treatment   
          0      .0791146   .0069456    11.39   0.000     .0655016    .0927277
          1      .2600204   .0111772    23.26   0.000     .2381135    .2819272

We make copies of two matrices from the margins's stored results to compare later. r(Jacobian) is the Jacobian matrix, which will be explained later. r(V) is the estimated variance matrix that corresponds with the reported predictive margins. The square root of the diagonal elements are reported in the above column labeled “Delta-method Std. Err.”.

. matrix rJ = r(Jacobian)

. matrix rV = r(V)

. display sqrt(rV[1,1])
.00694557

. display sqrt(rV[2,2])
.01117718

The delta method allows us to obtain the appropriate SEs of any smooth function of the fitted model parameters. It basically involves applying a Jacobian matrix to the estimated variance matrix of the fitted model parameters. The Jacobian is a matrix of partial derivatives of the statistics of interest with respect to each of the fitted model parameters.

Before we start taking derivatives, let's see how the predictive margins are essentially computed. This will provide us with the formulas from which we will work out the derivatives that go into the Jacobian matrix. In the following, p0 is the variable used to re-create the predictive margin for 0.treatment, and p1 corresponds to 1.treatment. To create p0 and p1, we need to know the coefficient legends. You can view all coefficient legends by running the command logit, coeflegend baselevels. The means of these variables are the predictive margins.

. logit, coeflegend baselevels

Logistic regression                                     Number of obs =  3,000
                                                        LR chi2(2)    = 246.49
                                                        Prob > chi2   = 0.0000
Log likelihood = -1242.8245                             Pseudo R2     = 0.0902



     outcome   Coefficient  Legend                                            
   
   treatment                                                                  
          0             0  _b[0b.treatment]                                   
          1       1.42776  _b[1.treatment]                                    
                                                                              
    distance    -.0047747  _b[distance]                                       
       _cons    -2.337762  _b[_cons]                                          



. generate p0 = invlogit(_b[0b.treatment] + _b[distance]*distance + _b[_cons])

. generate p1 = invlogit(_b[1.treatment] + _b[distance]*distance + _b[_cons])

. sum p0 p1


    Variable          Obs        Mean    Std. dev.       Min        Max
   
          p0        3,000    .0791146    .0212555   .0026954   .0879477
          p1        3,000    .2600204    .0688961    .011143   .2867554

Now we will make the calculations necessary to reproduce the Jacobian matrix, which we will then use to reproduce the estimated variance matrix. The rows identify what we are taking the partial derivative of; the columns identify what we are taking the partial derivative with respect to. Thus the rows correspond with the predictive margins, and the columns correspond with the fitted model parameters in the logistic regression.

. generate dp0dxb = p0*(1-p0)

. generate dp1dxb = p1*(1-p1)

. generate zero = 0

. generate one = 1

. matrix vecaccum J0 = dp0dxb zero zero distance

. matrix vecaccum J1 = dp1dxb zero one distance

. matrix Jac = J0/e(N) \ J1/e(N)

. matrix V = Jac*e(V)*Jac'

. matrix list Jac

Jac[2,4]
             zero       zero   distance      _cons
dp0dxb          0          0  .74390659  .07240388
dp1dxb          0  .18766468  2.1907626  .18766468

. matrix list rJ

rJ[2,4]
               outcome:   outcome:   outcome:   outcome:
                    0b.         1.                      
             treatment  treatment   distance      _cons
0.treatment          0          0  .74390659  .07240388
1.treatment          0  .18766468  2.1907626  .18766468

. matrix list V

symmetric V[2,2]
            dp0dxb      dp1dxb
dp0dxb   .00004824
dp1dxb  -1.181e-07   .00012493

. matrix list rV

symmetric rV[2,2]
                      0.          1.
              treatment   treatment
0.treatment   .00004824
1.treatment  -1.181e-07   .00012493

Now let's compute the marginal effect of treatment. margins keeps a placeholder for the base outcome of treatment; the Jacobian and variance elements are merely set to 0.

. margins, dydx(treatment)

Average marginal effects                                 Number of obs = 3,000
Model VCE: OIM

Expression: Pr(outcome), predict()
dy/dx wrt:  1.treatment



                          Delta-method
                    dy/dx   std. err.      z    P>|z|     [95% conf. interval]
   
 1.treatment     .1809057   .0131684    13.74   0.000     .1550961    .2067153

Note: dy/dx for factor levels is the discrete change from the base level.

. matrix list r(Jacobian)

r(Jacobian)[2,4]
                outcome:   outcome:   outcome:   outcome:
                     0b.         1.                      
              treatment  treatment   distance      _cons
0b.treatment          0          0          0          0
 1.treatment          0  .18766468   1.446856  .11526081

. matrix list r(V)

symmetric r(V)[2,2]
                     0b.         1.
              treatment  treatment
0b.treatment          0
 1.treatment          0  .00017341

The marginal effect of treatment is just the difference between the predictive margins. Here we take advantage of the fact that the difference of the means is the mean of the differences.

. generate pdiff = p1 - p0

. sum pdiff


    Variable          Obs        Mean    Std. dev.       Min        Max
   
       pdiff        3,000    .1809057    .0476499   .0084476   .1988077

The Jacobian is similarly composed from the previous calculations. We compose it ignoring the base level of treatment. Notice that the elements of Jdiff match those of the second row of r(Jacobian) above.

. matrix Jdiff = (-1, 1)*Jac

. matrix list Jdiff

Jdiff[1,4]
         zero       zero   distance      _cons
r1          0  .18766468   1.446856  .11526081

. matrix Vdiff = Jdiff*e(V)*Jdiff'

. matrix list Vdiff

symmetric Vdiff[1,1]
           r1
r1  .00017341

Continuous variables

The marginal effect of a continuous variable is essentially computed the same way, but there are more derivatives involved. Here we have margins compute the AME of distance and save copies of r(Jacobian) and r(V).

. margins, dydx(distance)

Average marginal effects                                 Number of obs = 3,000
Model VCE: OIM

Expression: Pr(outcome), predict()
dy/dx wrt:  distance



                          Delta-method
                    dy/dx   std. err.      z    P>|z|     [95% conf. interval]
   
    distance    -.0006228   .0001444    -4.31   0.000    -.0009058   -.0003399


. matrix rJ = r(Jacobian)

. matrix rV = r(V)

. display sqrt(rV[1,1])
.00014436

The AME of distance is the average of the derivative of the prediction with respect to distance.

 
. predict p, pr

. generate dpdxb = p*(1-p)

. generate dpdx = dpdxb*_b[distance]

. sum dpdx


    Variable          Obs        Mean    Std. dev.       Min        Max
   
        dpdx        3,000   -.0006228    .0003231  -.0009766  -.0000128

For the Jacobian, we must compute the partials of dpdx with respect to the model parameters. The partial with respect to the coefficient for distance has an extra piece.

. generate d2pdxb2 = p*(1-p)*(1-p) - p*p*(1-p)

. matrix vecaccum Jac = d2pdxb2 0b.treatment 1.treatment distance

. matrix Jac = Jac*_b[distance]/e(N)

. sum dpdxb


    Variable          Obs        Mean    Std. dev.       Min        Max
   
       dpdxb        3,000    .1304451    .0676672   .0026901   .2045267


. matrix Jac[1,3] = Jac[1,3] + r(mean)

. matrix V = Jac*e(V)*Jac'

. matrix list Jac

Jac[1,4]
                 0b.          1.                        
          treatment   treatment    distance       _cons
d2pdxb2           0  -.00020228    .1256217  -.00034567

. matrix list rJ

rJ[1,4]
             outcome:    outcome:    outcome:    outcome:
                  0b.          1.                        
           treatment   treatment    distance       _cons
distance           0  -.00020228    .1256217  -.00034567

. matrix list V

symmetric V[1,1]
           d2pdxb2
d2pdxb2  2.084e-08

.matrix list rV

symmetric rV[1,1]
           distance
distance  2.084e-08

Conclusion

In summary, I have shown how to compute discrete and continuous marginal effects along with their corresponding SE estimates using the delta method. This is essentially what margins does in all cases, except that it often uses numerical derivatives.

How are average marginal effects and their standard errors computed by margins using the delta method?

Introduction

Factor variables

Continuous variables

Conclusion

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


outcome		Coefficient Std. err. z P>\|z\| [95% conf. interval]

1.treatment		1.42776 .113082 12.63 0.000 1.206124 1.649397
distance		-.0047747 .0011051 -4.32 0.000 -.0069406 -.0026088
_cons		-2.337762 .0962406 -24.29 0.000 -2.52639 -2.149134


		Delta-method
		Margin std. err. z P>\|z\| [95% conf. interval]

treatment
0		.0791146 .0069456 11.39 0.000 .0655016 .0927277
1		.2600204 .0111772 23.26 0.000 .2381135 .2819272


outcome		Coefficient Legend

treatment
0		0 _b[0b.treatment]
1		1.42776 _b[1.treatment]

distance		-.0047747 _b[distance]
_cons		-2.337762 _b[_cons]

Variable		Obs Mean Std. dev. Min Max

p0		3,000 .0791146 .0212555 .0026954 .0879477
p1		3,000 .2600204 .0688961 .011143 .2867554


		Delta-method
		dy/dx std. err. z P>\|z\| [95% conf. interval]

1.treatment		.1809057 .0131684 13.74 0.000 .1550961 .2067153

Variable		Obs Mean Std. dev. Min Max

pdiff		3,000 .1809057 .0476499 .0084476 .1988077

Variable		Obs Mean Std. dev. Min Max

dpdx		3,000 -.0006228 .0003231 -.0009766 -.0000128

Variable		Obs Mean Std. dev. Min Max

dpdxb		3,000 .1304451 .0676672 .0026901 .2045267

Stata/MP4 Annual License (download)

How are average marginal effects and their standard errors computed by margins using the delta method?

Introduction

Factor variables

Continuous variables

Conclusion

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies