Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Heteroskedastic Probit Model

From   Maarten buis <>
Subject   Re: st: Heteroskedastic Probit Model
Date   Thu, 22 Apr 2010 08:57:53 -0700 (PDT)

--- On Thu, 22/4/10, Mustafa Brahim wrote:
> Just a matter of curiosity. How you
> know it is not there in the first
> place? that's exactly what I want to  understand.

It may be easiest to start with a case where
such information is present: Say we want to know
what the "effect" of being a foreign (non-US) or 
domestic (US) car on the price. We might type:

sysuse auto, clear
reg price foreign

we find an "effect" of foreign that is $312.26.
The information for this effect came from the 
average price within the group of foreign cars
and domestic cars:

sum price if foreign == 1
local for = r(mean)
sum price if foreign == 0
local dom = r(mean)

di `for' - `dom'

Now we move a bit further away from the data.
Say we want to know whether there is 
heteroskedasticity of the type that the 
residual variance differs between foreign
and domestic cars. What kind of information 
is then available: We can compute the 
residuals, that is the difference between the
predicted price and the observed price, and
than we can compute the variance of these
residuals separately for foreign cars and 
domestic cars:

reg price foreign
predict yhat
gen resid = yhat - price 
table foreign, c(sd resid)

Now we move still a bit further away from the
data. We think that there is some latent
propensity (y*) in every car for being foreign, 
and that this latent propensity has the form

y* = b0 + b1 mpg + e

e is a normally/Gaussian distributed error term,
with mean 0 and variance 1.

Problem is we don't observe y*, instead we observe
foreign = 1 if y* >  0 
foreign = 0 if y* <= 0

So what is the empirical information we have to
estimate this model?

This model implies a certain relation between the
probability of being foreign en mpg, to be precise
Pr(foreign==1) = Phi(b0 + b1 mpg), where Phi is 
the cumulative distribution function of the standard
normal distribution. We could also compute the 
probability for different values of mpg and try out
different values of b0 and b1 such that the predicted
values are most similar to the observed probabilities.
This is the probit model:

probit foreign mpg
predict pr
collapse (mean) pr foreign, by(mpg)
scatter pr foreign mpg

Notice that the data do not give a lot of information
on the exact shape of the relationship (wide scatter).

Lets now take yet another step away from the data and
move towards the -hekprob- model. The residuals that 
are being modeled in that model are the difference 
between the y* and the predicted probability. The 
problem is that y* is not observed. So what information 
is present in our data to estimate such a model? Well, 
if we are willing to assume a certain functional form 
for the relationship between the variance of e and the 
some variables, we could use the same trick as with
-probit-, but the information we are trying to use then
is that these assumptions implie subtle changes in the 
shape of the relationship between the observed variable 
and the average (i.e. probability) of foreign, and as 
we saw in the previous graph, the data contains only 
very rough information on the shape of that 
relationship. So that is what I meant when I said that
the necesarry information isn't there in the first 

hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index