[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: About the Heckman selection model

From   Maarten buis <>
Subject   Re: st: About the Heckman selection model
Date   Wed, 17 Feb 2010 04:58:49 -0800 (PST)

--- On Wed, 17/2/10, Maria Quattri wrote:
> 1) Both the coefficients for the Probit and those for the
> OLS seem to have no direct interpretation.  Therefore,
> I would consider the significance of marginal effects only:
> Pr(y observed) for the Probit  and E(y|y observed) for
> the OLS. Is that right?

No, especially E(y | y observed) is usually not the most 
interesting outcome, more often you would want to look at 
either E(y) (the dependent variable as one would observe 
them, thus including the censored observations), or E(y*)
(the latent dependent variable). When trying to understand
your results you want to look at all of them.
> 2) Is there any way to test the bivariate normality of the
> error terms for the maximum likelihood estimation in Stata?

No, where would that information come from? Think about the
selection equation: The empricial information about the 
distribution of the "error term" is only there in the form 
of the shape of the relationship between the probability 
and the explanatory variables. That is just not enough to 
build a reliable test.

> 3) While Stata twostep option automatically corrects
> standard errors after the inverse Mills ratio enters the
> regression as estimated parameter (i.e. bootstrapping is not
> necessary), the twostep does not allow robust estimation.
> This seems to suggest that running Heckman manually
> (Probit+OLS with robust s.e. and boostrap, say 1000
> replications) could be better option for inference. Is it
> so?

No, the bootstrap won't use the information from the robust
standard errors, and point estimates will be exactly the 
same the models without robust standard errors. So this 
procedure will not get you what you want. Moreover, robust
standard errors are not so great to be worth going through
any special effort (with the danger of introducing bugs).
Some people think that robust standard errors are the 
greatest thing since sliced bread, others think they are 
evil (and most hold a position somewhere in between). See 
for example: 
Freeman, D.A. (2006) On the So-Called "Huber Sandwich 
Estimator" and "Robust Standard Errors", The American 
Statistician, 60(4): 299--303.

> 4) The robust MLE is less general than the two-step, yet it
> seems to be preferred apart from when the estimated rho
> approaches 1. Which value for rho is "big enough" to suggest
> the use of the twostep procedure?

Again, don't get too carried away with those robust standard
errors. If you can use them without doing anything you don't
want to do, then they probably don't do too much harm, 
otherwise just forget about them.

Regarding rho, rho is a correlation, so when rho is close to 
1 (or -1) it means that there is very little information you 
can use to distinguish between those two error terms. So use 
what you know about correlations to make that judgement. 

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2020 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index