 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: RE: margins command after "control function" IV negative binomial

 From "Wooldridge, Jeffrey" To Subject st: RE: margins command after "control function" IV negative binomial Date Thu, 3 Mar 2011 06:51:01 -0500

```Some quick comments about the CF control. You should only be obtaining
one control function, from the regression

reg Y1    X1   IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict  y1_resid, resid

If you believe the reduced form for Y1 is linear with an additive,
independent error, then adding this to the model controls for
endogeneity of any function of Y1 on the RHS of the Y2 equation. Of
course, you might want to put the CF in the second model in a flexible
way. One of the benefits of the CF approach is that it is a parsimonious
way to allow for many nonlinear functions of Y1. I hope I do a better
job of explaining this in 2e of my MIT Press book. See also my NBER and
UCL/IFS lectures with Imbens.

Some theorists will complain about your approach because if the
underlying "structural" model for Y2 is NB then the estimable model with
the CF (residual) added cannot be NB -- unless you make a restrictive
distributional assumption. This does not bother me so much.

You will have to compute the average marginal effects "by hand." This is
not hard with an exponential function. But take the derivative with
respect to Y1 to get

(b1 + b2*X1)exp(.)

And then average across all of your data. Use the bootstrap for a proper
standard error.

JW

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of
rlhall@umich.edu
Sent: Wednesday, March 02, 2011 5:52 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: margins command after "control function" IV negative
binomial

I am estimating a negative binomial model with two endogenous
regressors, the second of which is an interaction between the
endogenous variable and an exogenous term.  I am using what Wooldridge
calls the control function approach.  Cameron and Trivedi  describe
its implementation in Stata  (2010, 607-609).

The two endogenous variables, Y1, and Y1_X1 (Y1 interacted with an
exogenous variable.  Both are interval level variables.)

The system is overidentified.  I estimate the first stage equations.
(IV=instrument;  IV1_X1 = IV1*X1,  IV2_X1- IV2*X1 ).

reg Y1    X1   IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict  y1_resid, resid

reg Y1_X1  X1  IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict y1x1_resid, resid

I then estimate the 2nd stage:

nbreg Y2   X1 Y1 Y1_X1   y1_resid  y1x1_resid, vce(cluster id)

The problem comes in calculating the predicted counts at various
levels of the key variables, e.g.,:

margins, predict (n) atmeans at(X1=2 Y1=5 Y1_X1=10)

This produces huge predicted counts, often several times the maximum
predicted count for the model: predict nbreg_hat, n)

Insofar as I can tell, the problem arises because I am setting the
values of the endogenous variables but letting margins set the two
residual terms at their means. (The endogenous variables and their
respective first stage residuals are correlated at about .7).  But if
that?s the mistake, I don?t know at what values I should set the
residual terms to get the correct predicted counts.    (Setting the
residual terms at the same values as the respective endogenous terms
does not seem to produce sensible results either).

Thanks in advance for any light you might shed on this problem... or
guidance toward an altogether different approach.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```