[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
kokootchke <kokootchke@hotmail.com> |

To |
statalist <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Sample selection and endogeneity (or, combining heckman and ivreg) |

Date |
Thu, 6 Aug 2009 14:36:06 -0400 |

Dear Austin, Thanks A LOT for your comprehensive response. I've read it once and I'll probably need to read it multiple times to understand it better, but it all makes sense. Yeah, I thought that for the simulations I would have to make distributional assumptions on my dependent variable, which may be a strong assumption... and I also agree that it definitely doesn't look normal... it's rather skewed to the left, precisely because of the selection issue, I presume... Thank you very much... let me look into this. ---------------------------------------- > Date: Thu, 6 Aug 2009 12:21:23 -0400 > Subject: Re: st: Sample selection and endogeneity (or, combining heckman and ivreg) > From: austinnichols@gmail.com > To: statalist@hsphsun2.harvard.edu > > Adrian : > For GLM and GMM, you can read the Stata 11 manual entries for -gmm- > and -glm- and refs cited therein, or Cameron and Trivedi (two books > available from Stata's bookstore). > > You can run a simulation for some independent normally distributed X > variables and get one result, then run for some data that looks like > yours and get a totally different result, so it makes sense to use > data that looks like yours (same covariance structure)--it's easiest > to just start with your data, and modify it as needed. The > modifications would be: you specify the errors and the coefficients, > so you know the true relationship between X and y, then you try to > estimate it. > The simulation comes in because you specify distributions for error > terms and then you draw all the error terms needed 100 times, or > (better) 10000 times, to assess the distribution of estimated coefs > around true coefs, and rejection rates. > > For example (note I don't have your data, so I start by making data up > with -drawnorm-): > > clear all > prog pheck, rclass > syntax [, Corr(real .1) ] > matrix C = (1, `corr' \ `corr' , 1) > drawnorm u v, n(2400) corr(C) clear > g long i=mod(_n-1,60)+1 > egen mv=mean(v), by(i) > forv i=2/5 { > g x`i'=rnormal() > } > g x1=mv+x2+rnormal() > g y1=(-x1/5-x3/5+u>0) > g y2star=(y1/5+x1/5+x4/5+x5/5+v) > g s=(v+x1/5+x2/5+x3/5>0) > g y2=y2star if s > reg y2 y1 x1 x4 x5, cluster(i) > foreach v of varlist y1 x1 x4 x5 { > return scalar rb_`v'=_b[`v'] > return scalar rs_`v'=_se[`v'] > } > test x1=.2 > return scalar rrej_x1=(r(p) probit y1 x1 x3 > predict double xbeta1, xb > predict p > gen double im=normalden(xb)/normprob(xb) if y1==1 > replace im=-normalden(xb)/(1-normprob(xb)) if y1==0 > heckman y2 y1 x1 x4 x5 im, sel(x1 x2 im) cluster(i) iterate(1000) > if e(cmd) == "heckman" { > if e(converged) == 1 { > foreach v of varlist y1 x1 x4 x5 { > return scalar hb_`v'=_b[`v'] > return scalar hs_`v'=_se[`v'] > } > test x1=.2 > return scalar hrej_x1=(r(p) } > } > ivreg2 y2 (y1 x1=p x2 x3) x4 x5, gmm2s cluster(i) > foreach v of varlist y1 x1 x4 x5 { > return scalar ib_`v'=_b[`v'] > return scalar is_`v'=_se[`v'] > } > test x1=.2 > return scalar irej_x1=(r(p) eret clear > end > set seed 1 > pheck > simul,rep(100):pheck > tw kdensity ib_x1 || kdensity hb_x1 || kdensity rb_x1, xli(.2) > su *b_x1 *rej* *b_y1, sep(3) > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > ib_x1 | 100 .1538109 .0398201 .08462 .2944242 > hb_x1 | 58 .4007295 .0309691 .3406418 .499454 > rb_x1 | 100 .0452054 .0146647 .0097073 .1013072 > -------------+-------------------------------------------------------- > irej_x1 | 100 .43 .4975699 0 1 > hrej_x1 | 58 1 0 1 1 > rrej_x1 | 100 1 0 1 1 > -------------+-------------------------------------------------------- > ib_y1 | 100 1.686989 .4401385 .9865659 3.472792 > hb_y1 | 58 2.718954 .3095008 2.231136 3.557839 > rb_y1 | 100 .2975349 .0352731 .2353329 .3744754 > > In this example, IV gets close to the true coef on x1 of 0.2 but > overrejects by a huge margin (IV typically has a fraction of the OLS > bias in finite samples), while both OLS and the ad hoc method using > -heckman- do a terrible job (and -heckman- doesn't converge inside > 1000 iterations in many cases, so the code takes forever to run). > OLS looks better than IV and the ad hoc method for the coef on y1, but > none of the methods performs adequately. > > For your case, I would forget about the selection problem, and run > some panel data model with instruments. If you want to take a "more > correct" GMM approach and stack equations for the count of number of > bonds issued in a period and equations for spreads (or log-spreads) on > those bonds, you will need to find a coauthor, I suspect. But the > -gmm- command in Stata 11 will help, probably. > > > On Thu, Aug 6, 2009 at 2:30 AM, kokootchke wrote: >> Austin, thank you very much for your response. I agree that not having >> a reference would weaken my results and this is why I'm trying to see >> if someone in this Stata group can point in the right direction. I have >> thought about the simulations as well and I'm contemplating doing that, >> but I've never done this before and would like some pointers as to >> where I should start. Would you have any suggestions or do you have a >> reference that could help in that regard? Also, what do you mean by "for samples that look like yours"? > >> This is a very good point. I have also never used GLM/GMM in this context before, so could you please be more specific regarding what I need to know or where I should look in order to consider this option and try to implement it? > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ _________________________________________________________________ Express your personality in color! Preview and select themes for Hotmail®. http://www.windowslive-hotmail.com/LearnMore/personalize.aspx?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_express:082009 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Poisson vs. Linear regression for comparing rates***From:*Ashwin Ananthakrishnan <ashwinna@yahoo.com>

**Re: st: Poisson vs. Linear regression for comparing rates***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**st: Sample selection and endogeneity (or, combining heckman and ivreg)***From:*kokootchke <kokootchke@hotmail.com>

**Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)***From:*John Antonakis <john.antonakis@unil.ch>

**Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)***From:*Shehzad Ali <drshehzad_ali@yahoo.com>

**Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)***From:*Austin Nichols <austinnichols@gmail.com>

**RE: st: Sample selection and endogeneity (or, combining heckman and ivreg)***From:*kokootchke <kokootchke@hotmail.com>

**Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: AW: st: How to use the marco names?** - Next by Date:
**st: Upcoming NetCourses** - Previous by thread:
**Re: st: Sample selection and endogeneity (or, combining heckman and ivreg)** - Next by thread:
**RE: st: Sample selection and endogeneity (or, combining heckman and ivreg)** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |