Home  /  Resources & support  /  FAQs  /  Stata 5: Obtaining predicted probabilities after probit
Note: This FAQ is for users of Stata 5. It is not relevant for more recent versions.

Stata 5: How can I get predicted probabilities for different x values after probit?

Title   Stata 5: Obtaining predicted probabilities after probit
Author William Sribney, StataCorp

The programming techniques used in this answer are very simple in the beginning and very advanced at the end. The simple techniques will work fine, so don't think you must master the advanced ones.

Predicted probabilities after probit

Consider the following probit:

 . probit y gender age value, nolog 
  
 Probit Estimates                                        Number of obs =     16
                                                         chi2(3)       =   2.23
                                                         Prob > chi2   = 0.5253
 Log Likelihood = -9.4680749                             Pseudo R2     = 0.1055
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
   gender |   .7505686   1.118497      0.671   0.502      -1.441645    2.942782
      age |  -.0440944   .0937161     -0.471   0.638      -.2277746    .1395857
    value |   2.787841    2.10917      1.322   0.186      -1.346057    6.921739
    _cons |  -8.506711   5.827864     -1.460   0.144      -19.92912    2.915692
 ------------------------------------------------------------------------------

The predicted probabilities are given by the formula

pi = F(xi'*beta)

where F is the cumulative normal distribution, xi is the data vector for the i-th observation, and beta is the vector of coefficient estimates.

For this example,

xi = (gender[i], age[i], value[i], 1)

and

beta = (_b[gender], _b[age], _b[value], _b[_cons]).

You can use the _b[gender] syntax to access the coefficients:

   . display _b[gender]
   .75056856

(See [U] 20.5 Accessing coefficients and standard errors and [R] matrix get, for details on accessing coefficients after an estimation command.)

The predicted probabilities can be computed by

   . gen phat1 = normprob(_b[gender]*gender + _b[age]*age + _b[value]*value 
   > + _b[_cons])

but doing it this way is unnecessary. The predict command will do it for you:

 . predict phat2
   
 . list phat1 phat2
 
          phat1      phat2  
   1.  .1077763   .1077763  
   2.  .1343292   .1343292  
   3.  .1866688   .1866688  
   4.   .198736    .198736  
   5.  .2112619   .2112619  
   6.  .2376565   .2376565  
   7.  .3687274   .3687274  
   8.  .3837354   .3837354  
   9.  .4152546   .4152546  
  10.  .4171898   .4171898  
  11.   .484975    .484975  
  12.  .5157192   .5157192  
  13.  .5828102   .5828102  
  14.  .5854243   .5854243  
  15.  .6044933   .6044933  
  16.   .619428    .619428  

Evaluating predicted probabilities for particular values of x

At the mean of x

Often one wants to evaluate predicted probabilities at the mean of x:

mean of x = (mean of gender, mean of age, mean of value)

This can be done by

 . su gender age value
   
 Variable |     Obs        Mean   Std. Dev.       Min        Max
 ---------+-----------------------------------------------------
   gender |      16        .125    .341565          0          1  
      age |      16      24.125   4.964877         17         35  
    value |      16      3.2725   .2139315       3.05        3.7  
 
 . display normprob(_b[gender]*0.125 + _b[age]*24.125 + _b[value]*3.2725 
 > + _b[_cons])
 .36187213

This is very cumbersome! There is an easier way: use predict to compute the predicted index, take its mean, and take the normprob() of it:

 . predict ihat1, index
   
 . su ihat1
 
 Variable |     Obs        Mean   Std. Dev.       Min        Max
 ---------+-----------------------------------------------------
    ihat1 |      16   -.3534592    .514132  -1.238441   .3039789  
 
 . display _result(3)
 -.35345925
 
 . display normprob(_result(3))
 .36187209

This is what we got before (to within float precision).

Explanation: The index for the i-th observation is xi'*beta. Thus

   . gen ihat2 = _b[gender]*gender + _b[age]*age + _b[value]*value + _b[_cons]

also gives the index.

 . list ihat1 ihat2
   
          ihat1      ihat2  
   1. -1.238441  -1.238441  
   2. -1.106158  -1.106158  
   3. -.8902391  -.8902391  
   4. -.8461447  -.8461447  
   5. -.8020502  -.8020502  
   6. -.7138613  -.7138613  
   7. -.3352256  -.3352256  
   8. -.2956849  -.2956849  
   9. -.2140487  -.2140487  
  10. -.2090879  -.2090879  
  11. -.0376709  -.0376709  
  12.  .0394123   .0394123  
  13.  .2090879   .2090879  
  14.  .2157901   .2157901  
  15.  .2649949   .2649949  
  16.  .3039789   .3039789  

The key to our shortcut is to note that

mean of ihat1 = _b[gender]*(mean of gender) + _b[age]*(mean of age) + _b[value]*(mean of value) + _b[_cons]

At other values of x

One can do what we did initially

   . display normprob(_b[gender]*0.125 + _b[age]*24.125 + _b[value]*3.2725 
   > + _b[_cons])

and just substitute in different values for x = (gender, age, value).

This would be a big pain for a model with lots of independent variables.

In this case, it would be easier to use Stata’s matrix language:

  1. First put x = (mean of gender, mean of age, mean of value) in a vector:
     . gen cons = 1
       
     . matrix vecaccum mean = cons gender age value [iw=1/_N]
     
     . matrix list mean
     
     mean[1,4]
           gender     age   value   _cons
     cons    .125  24.125  3.2725       1
    
    CAUTION: Make sure the order of the variables is the same here as it is in the probit output.

    If you want to understand what matrix vecaccum does, see [R] matrix accum in the manual.

    Always double-check that the values in the vector mean are correct by using summarize.

     . su gender age value
       
     Variable |     Obs        Mean   Std. Dev.       Min        Max
     ---------+-----------------------------------------------------
       gender |      16        .125    .341565          0          1  
          age |      16      24.125   4.964877         17         35  
        value |      16      3.2725   .2139315       3.05        3.7  
    
  2. Get the coefficients as a vector:
     . matrix b = get(_b)
      
     . matrix list b
     
     b[1,4]
             gender         age       value       _cons
     y1   .75056856  -.04409444    2.787841  -8.5067114
    
  3. Get the index, then take normprob() of it:
     . matrix index = mean*b'
     
     . matrix list index
     
     symmetric index[1,1]
                   y1
     cons  -.35345924
     
     . display normprob(index[1,1])
     .36187209
    

    This is the same answer that we got before.


  4. Now, to do it at different values of x, we can just change some of the entries of the vector mean:
       . matrix x = mean
      
       . matrix x[1,1] = 0
       
       . matrix list x
       
       x[1,4]
             gender     age   value   _cons
       cons       0  24.125  3.2725       1
       
       . matrix index = x*b'
       
       . display normprob(index[1,1])
       .32733634
    

Here’s a fancier example:

 . su gender age value if gender==0
 
 Variable |     Obs        Mean   Std. Dev.       Min        Max
 ---------+-----------------------------------------------------
   gender |      14           0          0          0          0  
      age |      14    23.57143   5.079586         17         35  
    value |      14    3.279286   .2270366       3.05        3.7  
 
 . matrix vecaccum x = cons gender age value [iw=1/14] if gender==0
 
 . matrix list x
 
 x[1,4]
          gender        age      value      _cons
 cons          0  23.571429  3.2792857          1
 
 . matrix index = x*b'
 
 . display normprob(index[1,1])
 .34312349

For more information

See the following sections of the Stata 5.0 Documentation:

[U] 20.5 Accessing coefficients and standard errors, [R] logit, [R] matrix accum, [R] matrix define, [R] matrix get, [R] predict, [R] summarize