Search
   >> Home >> Resources & support >> FAQs >> Prediction confidence intervals after logistic regression

How do I obtain confidence intervals for the predicted probabilities after logistic regression?

Title   Prediction confidence intervals after logistic regression
Author Mark Inlow, StataCorp
Date April 1999; minor revisions July 2007

Using predict after logistic to get predicted probabilities and confidence intervals is somewhat tricky. The following two commands will give you predicted probabilities:

. logistic ...
. predict phat

The following does not give you the standard error of the predicted probabilities:

. logistic ...
. predict se_phat, stdp

Despite the name we chose, se_phat does not contain the standard error of phat. What does it contain? The standard error of the predicted index. The index is the linear combination of the estimated coefficients and the values of the independent variable for each observation in the dataset. Suppose we fit the following logistic regression model:

. logistic y x 

This model estimates b0 and b1 of the following model:

P(y = 1) = exp(b0+b1*x)/(1 + exp(b0+b1*x))

Here the index is b0 + b1*x. We could get predicted values of the index and its standard error as follows:

. logistic y x
. predict lr_index, xb
. predict se_index, stdp

We could transform our predicted value of the index into a predicted probability as follows:

. generate p_hat = exp(lr_index)/(1+exp(lr_index))

This is just what predict does by default after a logistic regression if no options are specified. Using a similar procedure, we can get a 95% confidence interval for our predicted probabilities by first generating the lower and upper bounds of a 95% confidence interval for the index and then converting these to probabilities:

. gen lb = lr_index - invnormal(0.975)*se_index
. gen ub = lr_index + invnormal(0.975)*se_index
. gen plb = exp(lb)/(1+exp(lb))
. gen pub = exp(ub)/(1+exp(ub))

Generating the confidence intervals for the index and then converting them to probabilities to get confidence intervals for the predicted probabilities is better than estimating the standard error of the predicted probabilities and then generating the confidence intervals directly from that standard error. The distribution of the predicted index is closer to normality than the predicted probability.

The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube