Sorry - this was a bad example. There was no "identification" issue in
my actual regression. My basic issue is with the syntax of the qvf
command.
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin
Nichols
Sent: Friday, March 10, 2006 3:01 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: qvf command for count data
On 3/10/06, Hugh Colaco <Hugh.Colaco@business.uconn.edu> wrote:
> qvf y x1 x2 x3 x4 (z1 x3 x4), family(nbinomial) robust cluster (A);
> ivreg y x1 x2 x3 x4 (z1 = x3 x4), robust cluster (A);
You seem to be misspecifying both -ivreg- and -qvf- calls at a very
basic level--which variables are included and excluded instruments?
Do you mean z1 to be an excluded instrument for two endogenous variables
x3 and x4? If so, your equation is not identified. Note your -ivreg-
syntax is regressing y on x1 and x2 and z1 (where z1 is instrumented by
x3 and x4) though I don't think it will run exactly as
written:
. net from http://www.stata-journal.com/software/sj3-4
. net inst st0049
. clear
. set obs 1000
. gen x1 = uniform()
. gen x2 = uniform()
. gen x3 = uniform()
. gen err = invnorm(uniform())
. gen y = 1+2*x1+3*x2+4*x3+err
. gen x4 = uniform()
. gen t3 = .8*x3 + .6*invnorm(uniform()) . ivreg y x1 x2 x3 x4 (z1 = x3
x4) equation not identified; must have at least as many instruments not
in the regression as there are instrumented variables r(481);
. qvf y x1 x2 x3 x4 (x1 x2 x4 t3)
IV Generalized linear models No. of obs =
1000
Optimization : MQL Fisher scoring Residual df =
995
(IRLS EIM) Scale param =
2.137276
Deviance = 2126.589444 (1/df) Deviance =
2.137276
Pearson = 2126.58962 (1/df) Pearson =
2.137276
Variance Function: V(u) = 1 [Gaussian]
Link Function : g(u) = u [Identity]
Standard Errors : OIM Sandwich
------------------------------------------------------------------------
------
y | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
-------------+------
x1 | 1.914558 .106234 18.02 0.000 1.706343
2.122773
x2 | 2.912829 .1086845 26.80 0.000 2.699811
3.125846
x4 | .2775132 .1095646 2.53 0.011 .0627706
.4922558
x3 | 4.106679 .3455157 11.89 0.000 3.429481
4.783877
_cons | .93817 .193988 4.84 0.000 .5579605
1.31838
------------------------------------------------------------------------
------
Try using -ivreg2- instead. It's got good first-stage diagnostics, and
the fact that your endogenous variable is a count variable does not
imply the standard IV estimator is not consistent--just that you lose a
tiny bit of efficiency by disregarding that fact. Note that many of the
classic RHS endogenous variables are counts, e.g. educational
attainment, and most researchers would use -ivreg2- on these models.
. ssc install ivreg2
. ivreg2 y x1 x2 x4 (x3=z1), ffirst
Summary results for first-stage regressions
-------------------------------------------
Shea
Variable | Partial R2 | Partial R2 F( 1, 995) P-value
x3 | 0.1009 | 0.1009 111.65 0.0000
Underidentification tests:
Chi-sq(1) P-value
Anderson canon. corr. likelihood ratio stat. 106.35 0.0000
Cragg-Donald N*minEval stat. 112.21 0.0000
Ho: matrix of reduced form coefficients has rank=K-1 (underidentified)
Ha: matrix has rank>=K (identified)
Weak identification statistics:
Cragg-Donald (N-L)*minEval/L2 F-stat 111.65
Anderson-Rubin test of joint significance of endogenous regressors B1 in
main equation, Ho:B1=0
F(1,995)= 67.79 P-val=0.0000
Chi-sq(1)= 68.13 P-val=0.0000
Number of observations N = 1000
Number of regressors K = 5
Number of instruments L = 5
Number of excluded instruments L2 = 1
Instrumental variables (2SLS) regression
----------------------------------------
Number of obs =
1000
F( 4, 995) =
292.31
Prob > F =
0.0000
Total (centered) SS = 3276.986562 Centered R2 =
0.7013
Total (uncentered) SS = 33323.59494 Uncentered R2 =
0.9706
Residual SS = 978.9706817 Root MSE =
.9894
------------------------------------------------------------------------
------
y | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
-------------+------
x3 | 4.106679 .337572 12.17 0.000 3.44505
4.768308
x1 | 1.914558 .1075383 17.80 0.000 1.703787
2.125329
x2 | 2.912829 .1075683 27.08 0.000 2.701999
3.123658
x4 | .2775132 .1073605 2.58 0.010 .0670905
.4879358
_cons | .93817 .1888342 4.97 0.000 .5680617
1.308278
------------------------------------------------------------------------
------
Anderson canon. corr. LR statistic (identification/IV relevance test):
106.350
Chi-sq(1) P-val =
0.0000
------------------------------------------------------------------------
------
Sargan statistic (overidentification test of all instruments):
0.000
(equation exactly
identified)
------------------------------------------------------------------------
------
Instrumented: x3
Included instruments: x1 x2 x4
Excluded instruments: z1
------------------------------------------------------------------------
------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/