Note: This FAQ is for Stata 10 and older versions of Stata. In Stata 11,
the **margins** command replaced **mfx**.

Title | Obtaining marginal effects without standard errors | |

Author | May Boggess, StataCorp | |

Date | April 2004; minor revisions July 2007 |

Not every
predict
option for every estimation command is suitable for calculating the standard
error of the marginal effects, so
**mfx** checks if the
predict option specified is suitable.

A marginal effect is the partial derivative of the prediction function f with respect to each covariate x. The mfx command calculates each of these derivatives numerically. This means that it uses the following approximation for each x_i:

df f(x_i+h) − f(x_i) ---- = -------------------- dx_i h

for an appropriate small change in x_i, h, holding all of the other covariates and coefficients constant. mfx evaluates this derivative at the mean of each of the covariates or, if you have used the at() option, at the values specified there.

The standard error of the marginal effect is computed by the delta method:

dM_i ' dM_i Var(M_i) = -------- Var(B) ------ db db

where M_i is the marginal effect of the ith independent variable x_i, and the vector dM_i/db has for its jth component, the partial derivative of M_i with respect to the the coefficient of the jth independent variable, b_j. This is because the marginal effect, evaluated at a point, is a function of the coefficients b_j only.

To calculate dM_i/db_j, mfx uses the usual approximation:

dM_i f(x_i, b_j+hb) − f(x_i, b_j) ------ = --------------------------------- db_j hb

where hb is a small change in b_j. This is a partial derivative, so this is done holding all other coefficients constant (at the value estimated by the estimation command), and all covariates are held constant at the value specified in the mfx command.

Problems arise in computing dM_i/db_j when the prediction function f depends on the coefficients in a less than straightforward manner. Let’s look at an example:

. use http://www.stata-press.com/data/r10/hsng2, clear(1980 Census housing data). ivregress 2sls rent pcturban (hsngval = faminc reg2-reg4)Instrumental variables (2SLS) regression Number of obs = 50 Wald chi2(2) = 90.76 Prob > chi2 = 0.0000 R-squared = 0.5989 Root MSE = 22.166 ------------------------------------------------------------------------------ rent | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hsngval | .0022398 .0003284 6.82 0.000 .0015961 .0028836 pcturban | .081516 .2987652 0.27 0.785 -.504053 .667085 _cons | 120.7065 15.22839 7.93 0.000 90.85942 150.5536 ------------------------------------------------------------------------------ Instrumented: hsngval Instruments: pcturban faminc reg2 reg3 reg4. mfx, predict(pr(200,300)) diagnostics(vce)Check prediction function does not depend on dependent variables, covariance matrix, or stored scalars. dfdx: .00001126 .00040971 dfdx, after resetting dependent variables, covariance matrix, and stored scalars: . . Relative difference = . warning: predict() expression pr(200,300) unsuitable for standard-error calculation; option nose imposed Marginal effects after ivregress y = Pr(200<rent<300) (predict, pr(200,300)) = .9399585 ------------------------------------------------------------------------------- variable | dy/dx X ---------------------------------+--------------------------------------------- hsngval | .0000113 48484 pcturban | .0004097 66.9491 -------------------------------------------------------------------------------

The diagnostics(vce) option shows us how mfx came to the conclusion that standard errors are not appropriate.

mfx checks this by setting the covariance matrix to the identity matrix, setting all the dependent variables to zero, and blanking out various scalars stored in the estimates. Then it recalculates the marginal effect. If it gets the same results it got the first time, it concludes that the prediction function did not depend on any of those quantities. But, if the results changed, then mfx concludes there is a problem.

In our example, the results certainly changed. What happened? Well, the function pr(200,300) depends on e(rmse), which is stored as a scalar, and thus has been blanked out. That’s why we got an empty answer the second time around. And it really is a problem for the prediction function to depend on e(rmse), because e(rmse) depends on the coefficients and when mfx is calculating the derivative of f with respect to a coefficient, it is assuming that f depends on the coefficients only through the coefficient matrix e(b).

What if the two matrices were the same but contained empty values, thus making the relative difference nonzero? It is probably a good idea to figure out why those marginal effects were coming up empty. Often it is because you are trying to evaluate the marginal effect at a point where the values of the prediction function are not very reasonable. So I would use the at() option (as well as nose to save some time) and calculate the marginal effects at points nearby where you were trying to calculate it. Sometimes a small change in the point will make a big difference. As a last resort, you can use the varlist option on mfx so the marginal effects that were empty will not be calculated, and it will pass the test.

What if the difference between the two was very, very small, say
10^{−10}? This is not small enough to pass the test, but
surely this difference is minor. That may well be true. I would try the
same approach as I did in the previous paragraph, using the
at() option to change the point, and see if
that makes a difference. Here is an example like that:

. use http://www.stata-press.com/data/r10/abdata, clear . set matsize 800 . xtabond n l(0/1).w l(0/2).(k ys) yr1980-yr1984 year, lags(2) noconstantArellano-Bond dynamic panel-data estimation Number of obs = 611 Group variable: id Number of groups = 140 Time variable: year Obs per group: min = 4 avg = 4.364286 max = 6 Number of instruments = 40 Wald chi2(15) = 1627.13 Prob > chi2 = 0.0000 One-step results ------------------------------------------------------------------------------ n | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- n | L1. | .7080866 .1455545 4.86 0.000 .4228051 .9933681 L2. | -.0886343 .0448479 -1.98 0.048 -.1765346 -.000734 w | --. | -.605526 .0661129 -9.16 0.000 -.735105 -.4759471 L1. | .4096717 .1081258 3.79 0.000 .1977491 .6215943 k | --. | .3556407 .0373536 9.52 0.000 .2824289 .4288525 L1. | -.0599314 .0565918 -1.06 0.290 -.1708493 .0509865 L2. | -.0211709 .0417927 -0.51 0.612 -.1030831 .0607412 ys | --. | .6264699 .1348009 4.65 0.000 .3622651 .8906748 L1. | -.7231751 .1844696 -3.92 0.000 -1.084729 -.3616214 L2. | .1179079 .1440154 0.82 0.413 -.1643572 .400173 yr1980 | .0113066 .0140625 0.80 0.421 -.0162554 .0388686 yr1981 | -.0212183 .0206559 -1.03 0.304 -.0617031 .0192665 yr1982 | -.034952 .022122 -1.58 0.114 -.0783103 .0084063 yr1983 | -.0287094 .0251536 -1.14 0.254 -.0780096 .0205909 yr1984 | -.014862 .0284594 -0.52 0.602 -.0706414 .0409174 ------------------------------------------------------------------------------ Instruments for differenced equation GMM-type: L(2/.).n Standard: D.w LD.w D.k LD.k L2D.k D.ys LD.ys L2D.ys D.yr1980 D.yr1981 D.yr1982 D.yr1983 D.yr1984. mfx, at(mean L.n=-0.06) diag(vce)Check prediction function does not depend on dependent variables, covariance matrix, or stored scalars. dfdx: .70808656 -.08863433 -.60552603 .40967169 .35564067 -.0599314 -.02117091 .62646995 -.7231751 .11790789 .01130656 -.02121832 -.03495199 -.02870935 -.01486203 dfdx, after resetting dependent variables, covariance matrix, and stored scalars: .70808656 -.08863433 -.60552603 .40967169 .35564067 -.0599314 -.02117091 .62646995 -.7231751 .11790789 .01130656 -.02121832 -.03495199 -.02870935 -.01486203 Relative difference = 0 Marginal effects after xtabond y = Linear prediction (predict) = -.84471245 ------------------------------------------------------------------------------ variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------- L.n | .7080866 .14555 4.86 0.000 .422805 .993368 -.06 L2.n | -.0886343 .04485 -1.98 0.048 -.176535 -.000734 1.09584 w | -.605526 .06611 -9.16 0.000 -.735105 -.475947 3.14957 L.w | .4096717 .10813 3.79 0.000 .197749 .621594 3.12676 k | .3556407 .03735 9.52 0.000 .282429 .428852 -.502119 L.k | -.0599314 .05659 -1.06 0.290 -.170849 .050987 -.429181 L2.k | -.0211709 .04179 -0.51 0.612 -.103083 .060741 -.391757 ys | .6264699 .1348 4.65 0.000 .362265 .890675 4.59385 L.ys | -.7231751 .18447 -3.92 0.000 -1.08473 -.361621 4.62901 L2.ys | .1179079 .14402 0.82 0.413 -.164357 .400173 4.66607 yr1980*| .0113066 .01406 0.80 0.421 -.016255 .038869 .225859 yr1981*| -.0212183 .02066 -1.03 0.304 -.061703 .019266 .229133 yr1982*| -.034952 .02212 -1.58 0.114 -.07831 .008406 .229133 yr1983*| -.0287094 .02515 -1.14 0.254 -.07801 .020591 .12766 yr1984*| -.014862 .02846 -0.52 0.602 -.070641 .040917 .057283 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1

If you want to force mfx to compute the standard error of the marginal effect, despite failing the above test, you can do so by using the force option. If you can’t find a better point, but the difference was very small for each point you tried, and you convinced yourself by examining the formula for the prediction function that it shouldn't depend on anything but the covariate values and the coefficient matrix e(b), then you may be confident enough to use force.

But remember, if ** diag(vce)** shows a large relative difference (say,
bigger than 10^{−2} for example) the standard errors given by
using force will probably be wrong because
mfx cannot take into account dependency on
coefficients that is not through e(b).