Subject   st: margins command after "control function" IV negative binomial
Date   Wed, 02 Mar 2011 17:52:04 -0500

I am estimating a negative binomial model with two endogenous regressors, the second of which is an interaction between the endogenous variable and an exogenous term. I am using what Wooldridge calls the control function approach. Cameron and Trivedi describe its implementation in Stata (2010, 607-609).

The two endogenous variables, Y1, and Y1_X1 (Y1 interacted with an exogenous variable. Both are interval level variables.)

The system is overidentified. I estimate the first stage equations. (IV=instrument; IV1_X1 = IV1*X1, IV2_X1- IV2*X1 ).

reg Y1    X1   IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict  y1_resid, resid

reg Y1_X1  X1  IV1  IV2  IV1_X1  IV2_X1, vce(cluster ID)
predict y1x1_resid, resid

I then estimate the 2nd stage:

nbreg Y2   X1 Y1 Y1_X1   y1_resid  y1x1_resid, vce(cluster id)

The problem comes in calculating the predicted counts at various levels of the key variables, e.g.,:

margins, predict (n) atmeans at(X1=2 Y1=5 Y1_X1=10)

This produces huge predicted counts, often several times the maximum predicted count for the model: predict nbreg_hat, n)

Insofar as I can tell, the problem arises because I am setting the values of the endogenous variables but letting margins set the two residual terms at their means. (The endogenous variables and their respective first stage residuals are correlated at about .7). But if that?s the mistake, I don?t know at what values I should set the residual terms to get the correct predicted counts. (Setting the residual terms at the same values as the respective endogenous terms does not seem to produce sensible results either).

Thanks in advance for any light you might shed on this problem... or guidance toward an altogether different approach.

