# RE: st: Sample selection and endogeneity (or, combining heckman and ivreg)

 From kokootchke To statalist Subject RE: st: Sample selection and endogeneity (or, combining heckman and ivreg) Date Thu, 6 Aug 2009 14:36:06 -0400

```Dear Austin,

Thanks A LOT for your comprehensive response. I've read it once and I'll probably need to read it multiple times to understand it better, but it all makes sense. Yeah, I thought that for the simulations I would have to make distributional assumptions on my dependent variable, which may be a strong assumption... and I also agree that it definitely doesn't look normal... it's rather skewed to the left, precisely because of the selection issue, I presume...

Thank you very much... let me look into this.

----------------------------------------
> Date: Thu, 6 Aug 2009 12:21:23 -0400
> Subject: Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)
> From: austinnichols@gmail.com
> To: statalist@hsphsun2.harvard.edu
>
> For GLM and GMM, you can read the Stata 11 manual entries for -gmm-
> and -glm- and refs cited therein, or Cameron and Trivedi (two books
> available from Stata's bookstore).
>
> You can run a simulation for some independent normally distributed X
> variables and get one result, then run for some data that looks like
> yours and get a totally different result, so it makes sense to use
> data that looks like yours (same covariance structure)--it's easiest
> modifications would be: you specify the errors and the coefficients,
> so you know the true relationship between X and y, then you try to
> estimate it.
> The simulation comes in because you specify distributions for error
> terms and then you draw all the error terms needed 100 times, or
> (better) 10000 times, to assess the distribution of estimated coefs
> around true coefs, and rejection rates.
>
> For example (note I don't have your data, so I start by making data up
> with -drawnorm-):
>
> clear all
> prog pheck, rclass
> syntax [, Corr(real .1) ]
> matrix C = (1, `corr' \ `corr' , 1)
> drawnorm u v, n(2400) corr(C) clear
> g long i=mod(_n-1,60)+1
> egen mv=mean(v), by(i)
> forv i=2/5 {
> g x`i'=rnormal()
> }
> g x1=mv+x2+rnormal()
> g y1=(-x1/5-x3/5+u>0)
> g y2star=(y1/5+x1/5+x4/5+x5/5+v)
> g s=(v+x1/5+x2/5+x3/5>0)
> g y2=y2star if s
> reg y2 y1 x1 x4 x5, cluster(i)
> foreach v of varlist y1 x1 x4 x5 {
> return scalar rb_`v'=_b[`v']
> return scalar rs_`v'=_se[`v']
> }
> test x1=.2
> return scalar rrej_x1=(r(p) probit y1 x1 x3
> predict double xbeta1, xb
> predict p
> gen double im=normalden(xb)/normprob(xb) if y1==1
> replace im=-normalden(xb)/(1-normprob(xb)) if y1==0
> heckman y2 y1 x1 x4 x5 im, sel(x1 x2 im) cluster(i) iterate(1000)
> if e(cmd) == "heckman" {
> if e(converged) == 1 {
> foreach v of varlist y1 x1 x4 x5 {
> return scalar hb_`v'=_b[`v']
> return scalar hs_`v'=_se[`v']
> }
> test x1=.2
> return scalar hrej_x1=(r(p) }
> }
> ivreg2 y2 (y1 x1=p x2 x3) x4 x5, gmm2s cluster(i)
> foreach v of varlist y1 x1 x4 x5 {
> return scalar ib_`v'=_b[`v']
> return scalar is_`v'=_se[`v']
> }
> test x1=.2
> return scalar irej_x1=(r(p) eret clear
> end
> set seed 1
> pheck
> simul,rep(100):pheck
> tw kdensity ib_x1 || kdensity hb_x1 || kdensity rb_x1, xli(.2)
> su *b_x1 *rej* *b_y1, sep(3)
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> ib_x1 | 100 .1538109 .0398201 .08462 .2944242
> hb_x1 | 58 .4007295 .0309691 .3406418 .499454
> rb_x1 | 100 .0452054 .0146647 .0097073 .1013072
> -------------+--------------------------------------------------------
> irej_x1 | 100 .43 .4975699 0 1
> hrej_x1 | 58 1 0 1 1
> rrej_x1 | 100 1 0 1 1
> -------------+--------------------------------------------------------
> ib_y1 | 100 1.686989 .4401385 .9865659 3.472792
> hb_y1 | 58 2.718954 .3095008 2.231136 3.557839
> rb_y1 | 100 .2975349 .0352731 .2353329 .3744754
>
> In this example, IV gets close to the true coef on x1 of 0.2 but
> overrejects by a huge margin (IV typically has a fraction of the OLS
> bias in finite samples), while both OLS and the ad hoc method using
> -heckman- do a terrible job (and -heckman- doesn't converge inside
> 1000 iterations in many cases, so the code takes forever to run).
> OLS looks better than IV and the ad hoc method for the coef on y1, but
> none of the methods performs adequately.
>
> For your case, I would forget about the selection problem, and run
> some panel data model with instruments. If you want to take a "more
> correct" GMM approach and stack equations for the count of number of
> bonds issued in a period and equations for spreads (or log-spreads) on
> those bonds, you will need to find a coauthor, I suspect. But the
> -gmm- command in Stata 11 will help, probably.
>
>
> On Thu, Aug 6, 2009 at 2:30 AM, kokootchke wrote:
>> Austin, thank you very much for your response. I agree that not having
>> a reference would weaken my results and this is why I'm trying to see
>> if someone in this Stata group can point in the right direction. I have
>> thought about the simulations as well and I'm contemplating doing that,
>> but I've never done this before and would like some pointers as to
>> where I should start. Would you have any suggestions or do you have a
>> reference that could help in that regard? Also, what do you mean by "for samples that look like yours"?
>
>> This is a very good point. I have also never used GLM/GMM in this context before, so could you please be more specific regarding what I need to know or where I should look in order to consider this option and try to implement it?
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/

_________________________________________________________________
Express your personality in color! Preview and select themes for Hotmail®.
http://www.windowslive-hotmail.com/LearnMore/personalize.aspx?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_express:082009
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```