Home  /  Resources & support  /  FAQs  /  Obtaining marginal effects without standard errors
Note: This FAQ is for Stata 10 and older versions of Stata.

In Stata 11, the margins command replaced mfx.

When I run mfx, I am getting the warning message “warning: predict() expression unsuitable for standard-error calculation; option nose imposed.” What does that mean?

Title   Obtaining marginal effects without standard errors
Author May Boggess, StataCorp

Not every predict option for every estimation command is suitable for calculating the standard error of the marginal effects, so mfx checks if the predict option specified is suitable.

A marginal effect is the partial derivative of the prediction function f with respect to each covariate x. The mfx command calculates each of these derivatives numerically. This means that it uses the following approximation for each x_i:

       df      f(x_i+h) − f(x_i)
      ---- = --------------------
      dx_i            h

for an appropriate small change in x_i, h, holding all of the other covariates and coefficients constant. mfx evaluates this derivative at the mean of each of the covariates or, if you have used the at() option, at the values specified there.

The standard error of the marginal effect is computed by the delta method:

                       dM_i  '          dM_i
      Var(M_i)    =  --------  Var(B)  ------
                        db               db

where M_i is the marginal effect of the ith independent variable x_i, and the vector dM_i/db has for its jth component, the partial derivative of M_i with respect to the the coefficient of the jth independent variable, b_j. This is because the marginal effect, evaluated at a point, is a function of the coefficients b_j only.

To calculate dM_i/db_j, mfx uses the usual approximation:

       dM_i          f(x_i, b_j+hb) − f(x_i, b_j)
      ------  =  ---------------------------------
       db_j                      hb

where hb is a small change in b_j. This is a partial derivative, so this is done holding all other coefficients constant (at the value estimated by the estimation command), and all covariates are held constant at the value specified in the mfx command.

Problems arise in computing dM_i/db_j when the prediction function f depends on the coefficients in a less than straightforward manner. Let’s look at an example:

 . use http://www.stata-press.com/data/r10/hsng2, clear
 (1980 Census housing data)

 . ivregress 2sls rent pcturban (hsngval = faminc reg2-reg4) 

 Instrumental variables (2SLS) regression               Number of obs =      50
                                                        Wald chi2(2)  =   90.76
                                                        Prob > chi2   =  0.0000
                                                        R-squared     =  0.5989
                                                        Root MSE      =  22.166
 
 ------------------------------------------------------------------------------
         rent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
      hsngval |   .0022398   .0003284     6.82   0.000     .0015961    .0028836
     pcturban |    .081516   .2987652     0.27   0.785     -.504053     .667085
        _cons |   120.7065   15.22839     7.93   0.000     90.85942    150.5536
 ------------------------------------------------------------------------------
 Instrumented:  hsngval
 Instruments:   pcturban faminc reg2 reg3 reg4

 . mfx, predict(pr(200,300)) diagnostics(vce)

 Check prediction function does not depend on dependent variables, 
 covariance matrix, or stored scalars.
 dfdx: 

   .00001126  .00040971


 dfdx, after resetting dependent variables, covariance matrix, and stored scalars: 

   .  .


 Relative difference = .

 warning: predict() expression pr(200,300) unsuitable for standard-error calculation;
 option nose imposed


 Marginal effects after ivregress
       y  = Pr(200<rent<300) (predict, pr(200,300))
          =   .9399585
 -------------------------------------------------------------------------------
                         variable |          dy/dx                 X
 ---------------------------------+---------------------------------------------
                          hsngval |        .0000113              48484
                         pcturban |        .0004097            66.9491
 -------------------------------------------------------------------------------

The diagnostics(vce) option shows us how mfx came to the conclusion that standard errors are not appropriate.

mfx checks this by setting the covariance matrix to the identity matrix, setting all the dependent variables to zero, and blanking out various scalars stored in the estimates. Then it recalculates the marginal effect. If it gets the same results it got the first time, it concludes that the prediction function did not depend on any of those quantities. But, if the results changed, then mfx concludes there is a problem.

In our example, the results certainly changed. What happened? Well, the function pr(200,300) depends on e(rmse), which is stored as a scalar, and thus has been blanked out. That’s why we got an empty answer the second time around. And it really is a problem for the prediction function to depend on e(rmse), because e(rmse) depends on the coefficients and when mfx is calculating the derivative of f with respect to a coefficient, it is assuming that f depends on the coefficients only through the coefficient matrix e(b).

What if the two matrices were the same but contained empty values, thus making the relative difference nonzero? It is probably a good idea to figure out why those marginal effects were coming up empty. Often it is because you are trying to evaluate the marginal effect at a point where the values of the prediction function are not very reasonable. So I would use the at() option (as well as nose to save some time) and calculate the marginal effects at points nearby where you were trying to calculate it. Sometimes a small change in the point will make a big difference. As a last resort, you can use the varlist option on mfx so the marginal effects that were empty will not be calculated, and it will pass the test.

What if the difference between the two was very, very small, say 10−10? This is not small enough to pass the test, but surely this difference is minor. That may well be true. I would try the same approach as I did in the previous paragraph, using the at() option to change the point, and see if that makes a difference. Here is an example like that:

 . use http://www.stata-press.com/data/r10/abdata, clear

 . set matsize 800

 . xtabond n l(0/1).w l(0/2).(k ys) yr1980-yr1984 year, lags(2) noconstant 

 Arellano-Bond dynamic panel-data estimation  Number of obs         =       611
 Group variable: id                           Number of groups      =       140
 Time variable: year
                                              Obs per group:    min =         4
                                                                avg =  4.364286
                                                                max =         6

 Number of instruments =     40               Wald chi2(15)         =   1627.13
                                              Prob > chi2           =    0.0000
 One-step results
 ------------------------------------------------------------------------------
            n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
            n |
          L1. |   .7080866   .1455545     4.86   0.000     .4228051    .9933681
          L2. |  -.0886343   .0448479    -1.98   0.048    -.1765346    -.000734
            w |
          --. |   -.605526   .0661129    -9.16   0.000     -.735105   -.4759471
          L1. |   .4096717   .1081258     3.79   0.000     .1977491    .6215943
            k |
          --. |   .3556407   .0373536     9.52   0.000     .2824289    .4288525
          L1. |  -.0599314   .0565918    -1.06   0.290    -.1708493    .0509865
          L2. |  -.0211709   .0417927    -0.51   0.612    -.1030831    .0607412
           ys |
          --. |   .6264699   .1348009     4.65   0.000     .3622651    .8906748
          L1. |  -.7231751   .1844696    -3.92   0.000    -1.084729   -.3616214
          L2. |   .1179079   .1440154     0.82   0.413    -.1643572     .400173
       yr1980 |   .0113066   .0140625     0.80   0.421    -.0162554    .0388686
       yr1981 |  -.0212183   .0206559    -1.03   0.304    -.0617031    .0192665
       yr1982 |   -.034952    .022122    -1.58   0.114    -.0783103    .0084063
       yr1983 |  -.0287094   .0251536    -1.14   0.254    -.0780096    .0205909
       yr1984 |   -.014862   .0284594    -0.52   0.602    -.0706414    .0409174
 ------------------------------------------------------------------------------
 Instruments for differenced equation
         GMM-type: L(2/.).n
         Standard: D.w LD.w D.k LD.k L2D.k D.ys LD.ys L2D.ys D.yr1980 D.yr1981 D.yr1982
                   D.yr1983 D.yr1984


 . mfx, at(mean L.n=-0.06) diag(vce)

 Check prediction function does not depend on dependent variables, 
 covariance matrix, or stored scalars.
 dfdx: 

    .70808656  -.08863433  -.60552603   .40967169   .35564067   -.0599314  -.02117091
    .62646995   -.7231751   .11790789   .01130656  -.02121832  -.03495199  -.02870935
   -.01486203


 dfdx, after resetting dependent variables, covariance matrix, and stored scalars: 

    .70808656  -.08863433  -.60552603   .40967169   .35564067   -.0599314  -.02117091
    .62646995   -.7231751   .11790789   .01130656  -.02121832  -.03495199  -.02870935
   -.01486203


 Relative difference = 0

 Marginal effects after xtabond
       y  = Linear prediction (predict)
          = -.84471245
 ------------------------------------------------------------------------------
 variable |      dy/dx    Std. Err.     z    P>|z|  [    95% C.I.   ]      X
 ---------+--------------------------------------------------------------------
      L.n |   .7080866      .14555    4.86   0.000   .422805  .993368      -.06
     L2.n |  -.0886343      .04485   -1.98   0.048  -.176535 -.000734   1.09584
        w |   -.605526      .06611   -9.16   0.000  -.735105 -.475947   3.14957
      L.w |   .4096717      .10813    3.79   0.000   .197749  .621594   3.12676
        k |   .3556407      .03735    9.52   0.000   .282429  .428852  -.502119
      L.k |  -.0599314      .05659   -1.06   0.290  -.170849  .050987  -.429181
     L2.k |  -.0211709      .04179   -0.51   0.612  -.103083  .060741  -.391757
       ys |   .6264699       .1348    4.65   0.000   .362265  .890675   4.59385
     L.ys |  -.7231751      .18447   -3.92   0.000  -1.08473 -.361621   4.62901
    L2.ys |   .1179079      .14402    0.82   0.413  -.164357  .400173   4.66607
   yr1980*|   .0113066      .01406    0.80   0.421  -.016255  .038869   .225859
   yr1981*|  -.0212183      .02066   -1.03   0.304  -.061703  .019266   .229133
   yr1982*|   -.034952      .02212   -1.58   0.114   -.07831  .008406   .229133
   yr1983*|  -.0287094      .02515   -1.14   0.254   -.07801  .020591    .12766
   yr1984*|   -.014862      .02846   -0.52   0.602  -.070641  .040917   .057283
 ------------------------------------------------------------------------------
 (*) dy/dx is for discrete change of dummy variable from 0 to 1

If you want to force mfx to compute the standard error of the marginal effect, despite failing the above test, you can do so by using the force option. If you can’t find a better point, but the difference was very small for each point you tried, and you convinced yourself by examining the formula for the prediction function that it shouldn't depend on anything but the covariate values and the coefficient matrix e(b), then you may be confident enough to use force.

But remember, if diag(vce) shows a large relative difference (say, bigger than 10−2 for example) the standard errors given by using force will probably be wrong because mfx cannot take into account dependency on coefficients that is not through e(b).