Search
   >> Home >> Resources & support >> FAQs >> predict() option unsuitable for marginal effects
Note: This FAQ is for Stata 10 and older versions of Stata. In Stata 11, the margins command replaced mfx.

When I run mfx, I am getting the error message “predict() option unsuitable for marginal effects”. What does that mean?

Title   predict() option unsuitable for marginal effects
Author May Boggess, StataCorp
Date April 2004; updated February 2005

Not every predict() option for every estimation command is suitable for calculating marginal effects with the command mfx, so mfx checks that the predict() option specified is suitable.

A marginal effect is the partial derivative of the prediction function f with respect to each covariate x. The command mfx calculates each of these derivatives numerically. This means that it uses the following approximation for each x_i:

  df          f(x_i+h) − f(x_i)
 ----  =  --------------------
 dx_i                h

for an appropriate small change in x_i, h, holding all the other covariates and coefficients constant. Then, mfx evaluates this derivative at the mean of each of the covariates or, if you have used the at() option, at the values specified there.

This formula for the derivative is appropriate if the prediction function is a function only of the values of the covariates and their coefficients. For the partial derivative with respect to the covariate x, all other covariates and coefficients are held constant for the above calculation.

What is an example of a predict() option that depends on something else? Well, let’s look at a prediction function that depends on the value of the response: residuals.

 . sysuse auto, clear
 (1978 Automobile Data)
   
 . areg mpg weight gear, absorb(rep78)
 
                                                        Number of obs =      69
                                                        F(  2,    62) =   41.64
                                                        Prob > F      =  0.0000
                                                        R-squared     =  0.6734
                                                        Adj R-squared =  0.6418
                                                        Root MSE      =  3.5109
 
  ------------------------------------------------------------------------------
          mpg |      Coef.   Std. Err.      t    p>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
       weight |  -.0051031   .0009206    -5.54   0.000    -.0069433    -.003263
   gear_ratio |    .901478   1.565552     0.58   0.567    -2.228015    4.030971
        _cons |   34.05889   7.056383     4.83   0.000     19.95338     48.1644
 -------------+----------------------------------------------------------------
        rep78 |          F(4, 62) =      1.117   0.356           (5 categories)
 
 . mfx, predict(residuals)
 
 predict() expression residuals unsuitable for marginal-effect calculation
 r(119);

To see how mfx came to that conclusion, we use the diagnostics(beta) option:

 . mfx, predict(residuals) diagnostics(beta)

          Predict into observation 1 = 1.918876
       Predict into last observation = -5.640898
 Predict into all observations: mean = -1.207e-17
   Predict into all observations: sd = 5.131774
 predict() expression residuals unsuitable for marginal-effect calculation
 r(119);

To see if the prediction depends on something it should not, mfx uses the predict command and predicts into the first observation, after replacing the covariate values in that observation with the required values. It predicts into the last observation, after replacing the covariate values in that observation with the required values. It then checks if the two predictions are the same. It also predicts into all the observations, replacing the covariate values with the required values, and checks that the standard deviation of these predicted values is essentially zero. If it passes these tests, we conclude that the marginal effect will be calculated correctly.

For another example of an unsuitable predict() option, let’s look at one that depends on other observations used in the estimation command:

 . webuse lowbirth, clear
 (Applied Logistic Regression, Hosmer & Lemeshow)
 
 . clogit low lwt ptd, group(pairid)
 
 Iteration 0:   log likelihood = -36.962156
 Iteration 1:   log likelihood = -34.637228
 Iteration 2:   log likelihood = -34.569847
 Iteration 3:   log likelihood = -34.569638
 
 Conditional (fixed-effects) logistic regression   Number of obs   =        112
                                                   LR chi2(2)      =       8.49
                                                   Prob > chi2     =     0.0143
 Log likelihood = -34.569638                       Pseudo R2       =     0.1094
 
 ------------------------------------------------------------------------------
          low |      Coef.   Std. Err.      z    p>|z|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
          lwt |  -.0084936   .0067344    -1.26   0.207    -.0216928    .0047056
          ptd |   1.270864    .570099     2.23   0.026      .153491    2.388238
 ------------------------------------------------------------------------------
 
 . mfx, predict(pc1) diag(beta)
 
          Predict into observation 1 = .22979578
       Predict into last observation = .34900767
 Predict into all observations: mean = .5
   Predict into all observations: sd = 0
 predict() expression pc1 unsuitable for marginal-effect calculation
 r(119);

The prediction statistic pc1, following clogit, is the probability of a positive outcome, conditional on one positive outcome in the group. This means that the prediction depends on the group. We can see this more clearly if we calculate this probability by hand:

 . webuse lowbirth 
 (Applied Logistic Regression, Hosmer & Lemeshow)
 
 . clogit low lwt ptd, group(pairid)
 
 Iteration 0:   log likelihood = -36.962156
 Iteration 1:   log likelihood = -34.637228
 Iteration 2:   log likelihood = -34.569847
 Iteration 3:   log likelihood = -34.569638
 
 Conditional (fixed-effects) logistic regression   Number of obs   =        112
                                                   LR chi2(2)      =       8.49
                                                   Prob > chi2     =     0.0143
 Log likelihood = -34.569638                       Pseudo R2       =     0.1094
 
 ------------------------------------------------------------------------------
          low |      Coef.   Std. Err.      z    p>|z|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
          lwt |  -.0084936   .0067344    -1.26   0.207    -.0216928    .0047056
          ptd |   1.270864    .570099     2.23   0.026      .153491    2.388238
 ------------------------------------------------------------------------------
 
 . predict xb, xb
 
 . gen top=exp(xb)
 
 . by pairid, sort: egen bot=total(exp(xb))
 
 . gen mypc1=top/bot
 
 . predict pc1, pc1
 
 . summarize pc1 mypc1
 
     Variable |       Obs        Mean    Std. Dev.       Min        Max
 -------------+--------------------------------------------------------
          pc1 |       112          .5    .1882528   .0812115   .9187885
        mypc1 |       112          .5    .1882527   .0812115   .9187885

The last two lines confirm that I came up with the same predicted value as Stata. This shows that the predicted value depends on the group—the variable bot (the denominator in the prediction) depends on the group.

Now, a persistent user may say, I will pick a group and get the marginal effect using pc1 for just that one group. Let’s give it a try:

 . webuse lowbirth, clear
 (Applied Logistic Regression, Hosmer & Lemeshow)
  
 . clogit low lwt ptd, group(pairid)
 
 Iteration 0:   log likelihood = -36.962156
 Iteration 1:   log likelihood = -34.637228
 Iteration 2:   log likelihood = -34.569847
 Iteration 3:   log likelihood = -34.569638
 
 Conditional (fixed-effects) logistic regression   Number of obs   =        112
                                                   LR chi2(2)      =       8.49
                                                   Prob > chi2     =     0.0143
 Log likelihood = -34.569638                       Pseudo R2       =     0.1094
 
 ------------------------------------------------------------------------------
          low |      Coef.   Std. Err.      z    p>|z|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
          lwt |  -.0084936   .0067344    -1.26   0.207    -.0216928    .0047056
          ptd |   1.270864    .570099     2.23   0.026      .153491    2.388238
 ------------------------------------------------------------------------------
 
 . keep if pairid==1
 (110 observations deleted)
 
 . mfx, predict(pc1) diag(beta)
 
          Predict into observation 1 = .31435782
       Predict into last observation = .68564218
 Predict into all observations: mean = .5
   Predict into all observations: sd = 0
 predict() expression pc1 unsuitable for marginal-effect calculation
 r(119);

This is still no good. Why is that? Looking carefully at the formula for pc1 again, we notice that it doesn’t just depend on the number of observations in the group. It depends on all the values of the covariates in the observations in the group. So, if you predict into the first observation in the group, you get a different answer to predicting into the last observation in the group because in each case you wrote over one observation's values with the mean values of the covariates.

When we put all observations in the group equal to the mean values of the covariates, we predicted the same value, 0.5. Why can’t we do that? Look again at the formula for pc1. What happens then is that exp(xb) cancels out of the top and bottom leaving 1/n, which in our example is 1/2. This is a constant function so all the derivatives will be zero. So, no matter how you work it, it’s hopeless to get the marginal effects of pc1.

If you want to force mfx to compute a marginal effect, despite failing the above test, you can do so by using the force option. But remember that mfx is operating under the assumption that it does not matter which observation it predicts into, and since it has to predict somewhere, it is predicting into observation 1 of the e(sample).

It is possible to obtain a marginal effect after clogit by using the predict() option pu0 as the next example shows:

 . webuse lowbirth, clear
 (Applied Logistic Regression, Hosmer & Lemeshow)

 . clogit low lwt ptd, group(pairid)

 Iteration 0:   log likelihood = -34.641865  
 Iteration 1:   log likelihood = -34.569694  
 Iteration 2:   log likelihood = -34.569638  
 Iteration 3:   log likelihood = -34.569638  

 Conditional (fixed-effects) logistic regression   Number of obs   =        112
                                                   LR chi2(2)      =       8.49
                                                   Prob > chi2     =     0.0143
 Log likelihood = -34.569638                       Pseudo R2       =     0.1094

 ------------------------------------------------------------------------------
          low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
          lwt |  -.0084936   .0067344    -1.26   0.207    -.0216929    .0047056
          ptd |   1.270864   .5701046     2.23   0.026     .1534801    2.388249
 ------------------------------------------------------------------------------

 . mfx, predict(pu0) 

 Marginal effects after clogit
       y  = Pr(low|fixed effect is 0) (predict, pu0)
          =  .31078384
 ------------------------------------------------------------------------------
 variable |      dy/dx    Std. Err.     z    P>|z|  [    95% C.I.   ]      X
 ---------+--------------------------------------------------------------------
      lwt |  -.0018193      .00086   -2.12   0.034    -.0035 -.000139    127.17
      ptd*|    .294058      .14982    1.96   0.050   .000413  .587703   .223214
 ------------------------------------------------------------------------------
 (*) dy/dx is for discrete change of dummy variable from 0 to 1
The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube