 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Heteroskedastic Probit Model

 From Maarten buis To statalist@hsphsun2.harvard.edu Subject Re: st: Heteroskedastic Probit Model Date Thu, 22 Apr 2010 08:57:53 -0700 (PDT)

```--- On Thu, 22/4/10, Mustafa Brahim wrote:
> Just a matter of curiosity. How you
> know it is not there in the first
> place? that's exactly what I want to  understand.

It may be easiest to start with a case where
such information is present: Say we want to know
what the "effect" of being a foreign (non-US) or
domestic (US) car on the price. We might type:

sysuse auto, clear
reg price foreign

we find an "effect" of foreign that is \$312.26.
The information for this effect came from the
average price within the group of foreign cars
and domestic cars:

sum price if foreign == 1
local for = r(mean)
sum price if foreign == 0
local dom = r(mean)

di `for' - `dom'

Now we move a bit further away from the data.
Say we want to know whether there is
heteroskedasticity of the type that the
residual variance differs between foreign
and domestic cars. What kind of information
is then available: We can compute the
residuals, that is the difference between the
predicted price and the observed price, and
than we can compute the variance of these
residuals separately for foreign cars and
domestic cars:

reg price foreign
predict yhat
gen resid = yhat - price
table foreign, c(sd resid)

Now we move still a bit further away from the
data. We think that there is some latent
propensity (y*) in every car for being foreign,
and that this latent propensity has the form

y* = b0 + b1 mpg + e

e is a normally/Gaussian distributed error term,
with mean 0 and variance 1.

Problem is we don't observe y*, instead we observe
foreign:
foreign = 1 if y* >  0
foreign = 0 if y* <= 0

So what is the empirical information we have to
estimate this model?

This model implies a certain relation between the
probability of being foreign en mpg, to be precise
Pr(foreign==1) = Phi(b0 + b1 mpg), where Phi is
the cumulative distribution function of the standard
normal distribution. We could also compute the
probability for different values of mpg and try out
different values of b0 and b1 such that the predicted
values are most similar to the observed probabilities.
This is the probit model:

probit foreign mpg
predict pr
collapse (mean) pr foreign, by(mpg)
scatter pr foreign mpg

Notice that the data do not give a lot of information
on the exact shape of the relationship (wide scatter).

Lets now take yet another step away from the data and
move towards the -hekprob- model. The residuals that
are being modeled in that model are the difference
between the y* and the predicted probability. The
problem is that y* is not observed. So what information
is present in our data to estimate such a model? Well,
if we are willing to assume a certain functional form
for the relationship between the variance of e and the
some variables, we could use the same trick as with
-probit-, but the information we are trying to use then
is that these assumptions implie subtle changes in the
shape of the relationship between the observed variable
and the average (i.e. probability) of foreign, and as
we saw in the previous graph, the data contains only
very rough information on the shape of that
relationship. So that is what I meant when I said that
the necesarry information isn't there in the first
place.

hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```