 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: problem with binary choice models with a binary endogenous regressor

 From Erkan Duman To statalist@hsphsun2.harvard.edu Subject st: problem with binary choice models with a binary endogenous regressor Date Wed, 9 Apr 2014 12:53:57 +0300

```Hello Statalisters,
I am working on the impacts of migration on academic performance of
children in migrant families. The dependent variable is whether the
child attends school and the independent variable of interest is
whether the household receives remittances or not. Both are discrete
binary variables and the independent variable is endogenous. I use
historical migration rates at NUTS-2 level as an instrumental
variable. This instrument is assumed to be positively correlated with
the state of receiving remittances and uncorrelated with the error.

There are a few methods suggested in the literature to be used like
IV2SLS, bivariate probit and special regressor methods. None of them
worked for me. In IV2SLS, the coefficient of the independent variable
is out of (0,1) range.

In the bivariate probit method, my specification does not satisfy the
joint normality of errors assumption. So, the estimated coefficients
are severly biased.

I tried the semi-parametric estimation method proposed by Gallant and
Nychka. The corresponding stata code snp2 is written by Guiseppe De
Luca. The semi-parametric estimation method relaxes the joint
normality of errors assumption. However, the code uses maximum
likelihood estimation and it never converges in my sample. I restrict
the iterations to 1 to see whether I could manage to estimate the
marginal effects. However, the "mfx compute" command which is
suggested by De Luca to estimate marginal effects after running snp2
gives an error which is
oldest_girl_young is the dependent variable in my regression.

In the special regressor method,by using household head's age as the
special regressor I could estimate the coefficients. However, the
coefficient is again out of (0,1) range whereas Christopher Baum
suggests that special regressor method by construction does not have
this out of range coefficients problem. When I want to estimate the
marginal effects of the regressors by using the bootstrap option in
sspecialreg, Stata gives me "conformability error". Although the code
by F Baum (sspecialreg in Stata) estimates the coefficients, it cannot
estimate the marginal effects.

I believe the problem is due to the rareness of the treatment in my
sample. Approximately, 1.500 households out of 100.000 households
receive remittances. That is a ratio of 1.5%. Garry King worked on
this rare events problem. In my study in IV2SLS method, the first
stage corresponds to what Garry King refers to a rare event problem;
the dependent variable has a distribution where the ratio of 1's to
0's is below 5%. However, he couldn't help me in solving my problem.

I believe that the rareness of the treatment causes my problems is
also due to the probit regression output of the first stage
regression. After I ran a probit regression of receiving remittances
on the instrument and other exogenous variables from the second stage,
the "estat class" command suggest me that I never predict correctly
the households that receive remittances. That is, for the observations
that receive remittances, the predicted probabilities never exceeds
0.5. Actually, the predicted probabilities of receiving remittances
after a probit regression distributes between -0.1 to 0.1.

I seek help in finding a method which will consistently estimate the
coefficients and the marginal effects.

I think that semi-parametric methods which are appropriate for binary
choice models with binary endogenous regressors that do not use
maximum likelihood estimation may solve my problem. There is Klein and
Spady semi-parametric estimation method which uses kernel density.
in finding Stata codes for such semi-parametric methods?

Or any kind of help will be appreciated.
Thank you
Best regards.

--
Erkan Duman