[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)

From   Austin Nichols <>
Subject   Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)
Date   Wed, 5 Aug 2009 12:03:35 -0400

Shehzad, John, et al.:
Shehzad's code has some typos (compare the if qualifiers), I think,
and without a ref or a proof of consistency, I can't see how anyone
would get those kinds of results published.  To be able to trust them
yourself, you would also want to run some simulations to assess finite
sample performance for samples that look like yours (with true coefs
picked to be near your estimated coefs).  Also, note that the original
poster specifies panel data, though what the DGP is, I do not know.
Note in particular that the dependent var is a weighted average of
spreads for one or multiple bonds in a given quarter but a weighted
average of a nonnegative (skewed?) variable is not guaranteed to have
any desirable properties.  Also, still assuming that the dep var is
nonnegative or strictly positive, OLS and heckman are inappropriate,
relative to a GLM type model.  Presumably, the new GMM models in Stata
11 are a good place to turn, assuming suitable moments can be

On Wed, Aug 5, 2009 at 4:52 AM, Shehzad Ali<> wrote:
> To add to John's response, if your endogenous variable is binary, then I would use the following:
>        probit y1 x1 x2 x3
>        predict xbeta1, xb
>        gen imills1=normd(xb)/normprob(xb) if y1==1
>        replace imills1=-normd(xb)/(1-normprob(xb)) if y2==0
>        heckman y2 y1 $yvar $zvar imills1 [pw=weight], sel(selection_probit= y1 $xvar imills1) cluster(commune) mills(imr2)
> I have assumed that the endogenous var is endogenous in both selection and outcome equation.
> Regards,
> Shehzad
> ----- Original Message ----
>> From: John Antonakis <>
>> To:
>> Sent: Wednesday, August 5, 2009 7:18:14 AM
>> Subject: Re: st: Sample selection and endogeneity   (or, combining heckman and ivreg)
>> Hi:
>> One possibility is to manually obtain predicted values of the endogenous
>> variables (using regress), which will give you consistent estimates.
>> Then use the predicted values in the Heckman model and bootstrap the
>> standard errors.
>> HTH,
>> John.
>> On 05.08.2009 04:51, kokootchke wrote:
>> > Dear all,
>> >
>> > I am trying to estimate an equation in which the dependent variable is only
>> observed when a selection rule applies (your typical sample selection problem a
>> la Heckman). One of the independent variables in the main equation is
>> endogenous, and I'd like to use instrumental variables to address that issue
>> within the Heckman framework.
>> >
>> > I haven't been able to find any papers or references that deal with this
>> issue, especially because I have a panel dataset containing 40+ countries and
>> about 60 time periods (quarters). My approach is to run the selection probit,
>> then use the predicted values in a 2SLS framework. I guess I'd have to do some
>> standard-error correction (any hints on this would also be useful)... but I
>> wanted to ask if you guys could tell me whether there is a Stata command that
>> does this or if there are any references you could suggest?
>> >
>> > For more information on my particular case, please see below.
>> >
>> > Thanks!
>> > Adrian
>> >
>> >
>> > p.s. A few more details on my model:
>> >
>> > I want to estimate the effects of GDP growth and other macroeconomic variables
>> on bond spreads, so my dependent variable in the main equation is the yield
>> spread of a bond. The problem is that these spreads are primary market spreads
>> or "spreads at launch", which means they are only observed at the moment a
>> country places a bond in the market.
>> >
>> > My panel data are organized at a quarterly frequency. Whether a country issues
>> one or multiple bonds in a given quarter is irrelevant as I basically take a
>> weighted average of all spreads issued in a given quarter and use that as my
>> dependent variable.
>> >
>> > However, there are quarters when a country may not issue a bond... and this is
>> the selection problem I'm trying to get at using a Heckman model.
>> >
>> > On top of this, if we believe that the spreads are somehow related to the
>> level of interest rates in the country, then macroeconomic variables such as GDP
>> growth are going to be endogenous. I have one (potentially two) instrumental
>> variable I want to use, and this is why I want to do the 2SLS...
>> >
>> > Do you guys have any other suggestions besides what I suggested above?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index