Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: margins command after "control function" IV negative binomial

From   "Wooldridge, Jeffrey" <>
To   <>
Subject   st: RE: margins command after "control function" IV negative binomial
Date   Thu, 3 Mar 2011 06:51:01 -0500

Some quick comments about the CF control. You should only be obtaining
one control function, from the regression

reg Y1    X1   IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict  y1_resid, resid

If you believe the reduced form for Y1 is linear with an additive,
independent error, then adding this to the model controls for
endogeneity of any function of Y1 on the RHS of the Y2 equation. Of
course, you might want to put the CF in the second model in a flexible
way. One of the benefits of the CF approach is that it is a parsimonious
way to allow for many nonlinear functions of Y1. I hope I do a better
job of explaining this in 2e of my MIT Press book. See also my NBER and
UCL/IFS lectures with Imbens.

Some theorists will complain about your approach because if the
underlying "structural" model for Y2 is NB then the estimable model with
the CF (residual) added cannot be NB -- unless you make a restrictive
distributional assumption. This does not bother me so much.

You will have to compute the average marginal effects "by hand." This is
not hard with an exponential function. But take the derivative with
respect to Y1 to get

(b1 + b2*X1)exp(.)

And then average across all of your data. Use the bootstrap for a proper
standard error.



-----Original Message-----
[] On Behalf Of
Sent: Wednesday, March 02, 2011 5:52 PM
Subject: st: margins command after "control function" IV negative

I am estimating a negative binomial model with two endogenous  
regressors, the second of which is an interaction between the  
endogenous variable and an exogenous term.  I am using what Wooldridge  
calls the control function approach.  Cameron and Trivedi  describe  
its implementation in Stata  (2010, 607-609).

The two endogenous variables, Y1, and Y1_X1 (Y1 interacted with an  
exogenous variable.  Both are interval level variables.)

The system is overidentified.  I estimate the first stage equations.  
(IV=instrument;  IV1_X1 = IV1*X1,  IV2_X1- IV2*X1 ).

reg Y1    X1   IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict  y1_resid, resid

reg Y1_X1  X1  IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict y1x1_resid, resid

I then estimate the 2nd stage:

nbreg Y2   X1 Y1 Y1_X1   y1_resid  y1x1_resid, vce(cluster id)

The problem comes in calculating the predicted counts at various  
levels of the key variables, e.g.,:

margins, predict (n) atmeans at(X1=2 Y1=5 Y1_X1=10)

This produces huge predicted counts, often several times the maximum  
predicted count for the model: predict nbreg_hat, n)

Insofar as I can tell, the problem arises because I am setting the  
values of the endogenous variables but letting margins set the two  
residual terms at their means. (The endogenous variables and their  
respective first stage residuals are correlated at about .7).  But if  
that?s the mistake, I don?t know at what values I should set the  
residual terms to get the correct predicted counts.    (Setting the  
residual terms at the same values as the respective endogenous terms  
does not seem to produce sensible results either).

Thanks in advance for any light you might shed on this problem... or  
guidance toward an altogether different approach.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index