[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Alexander Cavallo <ACavallo@NavigantConsulting.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: prediction in binary choice model |

Date |
Thu, 18 Aug 2005 14:07:17 -0500 |

Dear StataList, I have a question about prediction in the case of binary choice models. Suppose I estimate a probit model: Ystar(i,j) = X(i,j)*BetaX + W(i,j)*BetaW + Z(j)*Gamma + u(i,j) Y(i,j) = 1 if Ystar(i,j)>0 and 0 otherwise where i indexes persons j indexes countries X(i,j) and W(i,j) are characteristics of persons Z(j) are characteristics of countries u(i,j) are the error terms I am interested in the simulated effects of a changes in X(i,j) and W(i,j) on the expected number of individuals with Yhat(i,j) = 1. In particular, suppose W(i,j) is unchanged but in the simulation, X(i,j) = X(i,j) + W(i,j). There are two ways to do the calculation. Method 1. Sum of predicted probabilities Predict the new probabilities [Psim(i,j)] after changing X(i,j) as indicated. Sum up the probabilities within country. Method 2. Sum of predicted count Predict the new probabilities after changing X(i,j). Then predict a new indicator variable if the simulated probability exceeds the threshold for county j. Ysim(i,j) = 1 if Psim(i,j) >= Cutoff(j) The literature suggests that the threshold level is arbitrary [see Greene's textbook "Econometric Analysis" for a discussion on prediction in binary choice]. Suppose I use the naive threshold of 0.50. I find very different results using the two methods. Here is a stylized example. Assume that there are 1000 observations and that baseline predicted probabilities are uniform on the interval [0.20, 0.70] and that the threshold for prediction is 0.50. In this case there are 400 observations with predicted outcome 1. Then the sum of predicted probabilities in the baseline case is given by 1000 times the integral from 0.20 to 0.70 of x*dx, which is 225. Suppose that the change causes an increase in each predicted probability of 10 percentage points. Then the new count of obs above the threshold is 600, and the sum of predicted probabilities is 275. Here are the results of the exercise Count Sum of of Obs Predicted With Probabilites P(i,j)>0.50 Baseline 400 225 Simulated 600 275 Delta +200 +50 % Delta +50% +22% My questions are: 1. What is the right way to form aggregate level predictions for baseline and simulated data? 2. I could change the threshold Cutoff(j) so that the count of obs with P(i,j)>Cutoff(j) matches the sample count of 1s. Is there a literature on the optimal threshold for prediction? Thanks for your help! --Alex Cavallo Managing Consultant Navigant Consulting, Inc. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: Programming question.** - Next by Date:
**st: Transforming Marginal Effects** - Previous by thread:
**RE: st: boostrap with stata** - Next by thread:
**st: Transforming Marginal Effects** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |