 »  Home »  Resources & support »  FAQs »  Stata 5: Obtaining predicted probabilities after probit
Note: This FAQ is for users of Stata 5. It is not relevant for more recent versions.

## Stata 5: How can I get predicted probabilities for different x values after probit?

 Title Stata 5: Obtaining predicted probabilities after probit Author William Sribney, StataCorp

The programming techniques used in this answer are very simple in the beginning and very advanced at the end. The simple techniques will work fine, so don't think you must master the advanced ones.

### Predicted probabilities after probit

Consider the following probit:

 . probit y gender age value, nolog

Probit Estimates                                        Number of obs =     16
chi2(3)       =   2.23
Prob > chi2   = 0.5253
Log Likelihood = -9.4680749                             Pseudo R2     = 0.1055

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
gender |   .7505686   1.118497      0.671   0.502      -1.441645    2.942782
age |  -.0440944   .0937161     -0.471   0.638      -.2277746    .1395857
value |   2.787841    2.10917      1.322   0.186      -1.346057    6.921739
_cons |  -8.506711   5.827864     -1.460   0.144      -19.92912    2.915692
------------------------------------------------------------------------------


The predicted probabilities are given by the formula

pi = F(xi'*beta)

where F is the cumulative normal distribution, xi is the data vector for the i-th observation, and beta is the vector of coefficient estimates.

For this example,

xi = (gender[i], age[i], value[i], 1)

and

beta = (_b[gender], _b[age], _b[value], _b[_cons]).

You can use the _b[gender] syntax to access the coefficients:

   . display _b[gender]
.75056856


(See [U] 20.5 Accessing coefficients and standard errors and [R] matrix get, for details on accessing coefficients after an estimation command.)

The predicted probabilities can be computed by

   . gen phat1 = normprob(_b[gender]*gender + _b[age]*age + _b[value]*value
> + _b[_cons])


but doing it this way is unnecessary. The predict command will do it for you:

 . predict phat2

. list phat1 phat2

phat1      phat2
1.  .1077763   .1077763
2.  .1343292   .1343292
3.  .1866688   .1866688
4.   .198736    .198736
5.  .2112619   .2112619
6.  .2376565   .2376565
7.  .3687274   .3687274
8.  .3837354   .3837354
9.  .4152546   .4152546
10.  .4171898   .4171898
11.   .484975    .484975
12.  .5157192   .5157192
13.  .5828102   .5828102
14.  .5854243   .5854243
15.  .6044933   .6044933
16.   .619428    .619428


### At the mean of x

Often one wants to evaluate predicted probabilities at the mean of x:

mean of x = (mean of gender, mean of age, mean of value)

This can be done by

 . su gender age value

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
gender |      16        .125    .341565          0          1
age |      16      24.125   4.964877         17         35
value |      16      3.2725   .2139315       3.05        3.7

. display normprob(_b[gender]*0.125 + _b[age]*24.125 + _b[value]*3.2725
> + _b[_cons])
.36187213


This is very cumbersome! There is an easier way: use predict to compute the predicted index, take its mean, and take the normprob() of it:

 . predict ihat1, index

. su ihat1

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
ihat1 |      16   -.3534592    .514132  -1.238441   .3039789

. display _result(3)
-.35345925

. display normprob(_result(3))
.36187209


This is what we got before (to within float precision).

Explanation: The index for the i-th observation is xi'*beta. Thus

   . gen ihat2 = _b[gender]*gender + _b[age]*age + _b[value]*value + _b[_cons]


also gives the index.

 . list ihat1 ihat2

ihat1      ihat2
1. -1.238441  -1.238441
2. -1.106158  -1.106158
3. -.8902391  -.8902391
4. -.8461447  -.8461447
5. -.8020502  -.8020502
6. -.7138613  -.7138613
7. -.3352256  -.3352256
8. -.2956849  -.2956849
9. -.2140487  -.2140487
10. -.2090879  -.2090879
11. -.0376709  -.0376709
12.  .0394123   .0394123
13.  .2090879   .2090879
14.  .2157901   .2157901
15.  .2649949   .2649949
16.  .3039789   .3039789


The key to our shortcut is to note that

mean of ihat1 = _b[gender]*(mean of gender) + _b[age]*(mean of age) + _b[value]*(mean of value) + _b[_cons]

### At other values of x

One can do what we did initially

   . display normprob(_b[gender]*0.125 + _b[age]*24.125 + _b[value]*3.2725
> + _b[_cons])


and just substitute in different values for x = (gender, age, value).

This would be a big pain for a model with lots of independent variables.

In this case, it would be easier to use Stata’s matrix language:

1. First put x = (mean of gender, mean of age, mean of value) in a vector:
 . gen cons = 1

. matrix vecaccum mean = cons gender age value [iw=1/_N]

. matrix list mean

mean[1,4]
gender     age   value   _cons
cons    .125  24.125  3.2725       1

CAUTION: Make sure the order of the variables is the same here as it is in the probit output.

If you want to understand what matrix vecaccum does, see [R] matrix accum in the manual.

Always double-check that the values in the vector mean are correct by using summarize.

 . su gender age value

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
gender |      16        .125    .341565          0          1
age |      16      24.125   4.964877         17         35
value |      16      3.2725   .2139315       3.05        3.7

2. Get the coefficients as a vector:
 . matrix b = get(_b)

. matrix list b

b[1,4]
gender         age       value       _cons
y1   .75056856  -.04409444    2.787841  -8.5067114

3. Get the index, then take normprob() of it:
 . matrix index = mean*b'

. matrix list index

symmetric index[1,1]
y1
cons  -.35345924

. display normprob(index[1,1])
.36187209


This is the same answer that we got before.

4. Now, to do it at different values of x, we can just change some of the entries of the vector mean:
   . matrix x = mean

. matrix x[1,1] = 0

. matrix list x

x[1,4]
gender     age   value   _cons
cons       0  24.125  3.2725       1

. matrix index = x*b'

. display normprob(index[1,1])
.32733634


Here’s a fancier example:

 . su gender age value if gender==0

Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
gender |      14           0          0          0          0
age |      14    23.57143   5.079586         17         35
value |      14    3.279286   .2270366       3.05        3.7

. matrix vecaccum x = cons gender age value [iw=1/14] if gender==0

. matrix list x

x[1,4]
gender        age      value      _cons
cons          0  23.571429  3.2792857          1

. matrix index = x*b'

. display normprob(index[1,1])
.34312349