Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Predicted probabilities after oprobit w/robust standard errors

From   Richard Williams <>
Subject   Re: st: Predicted probabilities after oprobit w/robust standard errors
Date   Fri, 02 Jun 2006 17:14:10 -0500

At 02:40 PM 6/2/2006, Nick Winter wrote:

You are confusing the (sampling) variance of the various estimates, with the variance of the underlying distribution. The latter is normalized to one regardless of the technique used to estimate the sampling variances.

Well put. And to try it one other way - lets say a particular case has a predicted probability of 30% of being in category 1. But, that 30% is itself an estimate. The 95% confidence interval for it might run from, say, 24% to 36%.

And, in an OLS regression, you have a single predicted value. In oprobit and other multi-outcome techniques, you have more than one predicted value. In all the techniques, the predicted value is your "best guess" as to the true value. But, because of sampling variability, your best guess may be too high or too low.

In terms similar to how Matt is putting it - suppose your OLS predicted value was $10,000, with a confidence interval that ran $1,000 either way. Then you specify robust standard errors, and then all of a sudden the predicted value is still $10,000 but with a confidence interval that runs a million dollars either way. (Hopefully this would never actually happen!) Well, I suppose you could say that, in the latter case, there is a greater probability that the person is actually a millionaire than in the first case. But, our "best guess" is still $10,000. Likewise, in an oprobit, our best guess of being in category 1 is going to stay at, say, 15%, but huge standard errors are going to make us less confident of how accurate that prediction is.

The ideas of sampling variability and heterogeneity may also be getting confounded here. You may have reason for believing there is heterogeneity in the residuals, e.g. there is more variability for women than men. If so, a location-scale (aka heterogeneous choice) model may be appropriate. But heterogeneity is different from sampling variability. Sampling variability is a characteristic of the sample, and things like drawing a larger sample will generally reduce it. But heterogeneity is a characteristic of the population; and even if you had the entire population in your sample, a failure to control for heterogeneity could bias your parameter estimates in a logit or probit analysis. See, for example,

Allison, Paul. 1999. "Comparing Logit and Probit Coefficients Across Groups." Sociological Methods and Research 28(2): 186-208.

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW (personal):
WWW (department):
* For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index