Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: xtivreg2, clustered errors and F statistic

From	"Schaffer, Mark E" <[email protected]>
To	<[email protected]>
Subject	st: RE: xtivreg2, clustered errors and F statistic
Date	Wed, 12 Oct 2011 00:37:29 +0100
Anna,

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Anna Rosso
> Sent: 11 October 2011 23:09
> To: [email protected]
> Subject: st: xtivreg2, clustered errors and F statistic
> 
> Dear list,
> 
> I am using Stata/MP 11.2 for Unix (Linux 64-bit x86-64) Born 
> 30 Mar 2011
> 
> I am running IV regressions using xtivreg2 (latest updated 
> version) on a panel of 16 regions and 11 years.
> 
> I estimate using regional fixed effects. My specification 
> includes among the RHS variables year dummies, 6 continuous 
> control variables, and the endogenous regressor of interest.
> I use 90  IVs to instrument my endogenous regressor, and I 
> cluster standard errors at the regional level.

Here are the counts from your estimation without controls:

Number of clusters             N_clust  =         16
Number of observations               N  =        176
Number of regressors                 K  =         11
Number of endogenous regressors      K1 =          1
Number of instruments                L  =        100
Number of excluded instruments       L1 =          0

Things are going very badly wrong, and you may in fact have found a bug
in xtivreg2 or ivreg2 - you somehow have 1 endogenous regressor but no
excluded instruments!  Please send me privately your full output of this
specification and I will try to trace what happened.

But you are unlikely to get sensible results with a setup like this in
any case.  You have 176 observations, 100 instruments and 1 endogenous
variable, so the degree of overidentification is very high compared to
the sample size.  The finite-sample bias of the IV estimator is
increasing in the degree of overidentification, so your estimated
coefficient is likely to be badly biased.  And that's not even
considering the problem that you have only 16 clusters.  For the
cluster-robust VCV to be consistent, the number of clusters has to go
off to infinity, and 16 isn't very far on the way to infinity.  Also, as
you note, you cannot test for weak or underidentification, because the
rank of the first-stage VCV will be only 16, so you can't test the joint
significance of the 100 instruments in the first-stage regression.

--Mark


> 
> The problem is the following:
> When I only include as RHS variables region and year dummies 
> and the endogenous regressor, my first stage F-stat for the 
> significance of excluded instruments goes to infinite. This 
> is what I would expect given the degrees of freedom are 
> "negative": F stat is distributed as F(k,d-k) Where k is the 
> number of constraints (90 in my case, as I have 90 
> instruments to test), d is the number of clusters(16)  
> 
> When I add the additional six control variables: my first 
> stage F-statistic is more than normal: 14.76. 
> Do you think it is possible? I don't understand why this is 
> happening only when I put controls in the regression.
> 
> I have also tried to "partial out" some variables, as 
> suggested in Baum, Schaffer and Stillman's paper("Enhanced 
> routines for instrumental variables/generalized method of 
> moments" The Stata Journal (2007), 7, Number 4, pp. 465-506) 
> when the number of clusters is less than the number of 
> exogenous regressors + excluded instruments. Partialling out 
> some exogenous regressors helps the covariance matrix of 
> orthogonality conditions to have full rank. Unfortunately, 
> this still has not solved the problem as I have many instruments.
> Also, the Kleibergen-Paap Wald rk F statistic  (which is the 
> one suggested by the authors of the above paper in case of 
> clustered errors) is reported as missing.
> 
> I report the command used and the output of first stage 
> statistics when I only control for year dummies using fixed 
> effect estimator (xtivreg2 with fe and cluster() options)
> --------------------------------------------------------------
> --------------
> --------------------------------------------------------------
> -------------
> xi: xtivreg2 netpay  (share_reg = GWmean40_UK_* 
> GWmean40_USA_* GWmean40_DE_*
> mean2004_40_UK_* mean2004_40_USA_* mean2004_40_DE_*) i.year if
> year>=1997&year<=2007 ,fe cluster(won) first
> 
> .....
> 
> F test of excluded instruments:
>   F( 90,    15) =  1.3e+13
>   Prob > F      =   0.0000
> Angrist-Pischke multivariate F test of excluded instruments:
>   F( 90,    15) =  6.7e+12
>   Prob > F      =   0.0000
> 
> Summary results for first-stage regressions
> -------------------------------------------
> 
>                                            (Underid)          
>   (Weak id)
> Variable     | F( 90,    15)  P-val | AP Chi-sq( 90) P-val | AP F( 90,
> 15)
> share_reg    |    1.3e+13    0.0000 |     1.5e+15   0.0000 |  
>    6.7e+12
> 
> NB: first-stage test statistics cluster-robust
> 
> .....
> 
> Underidentification test
> Ho: matrix of reduced form coefficients has rank=K1-1 
> (underidentified)
> Ha: matrix has rank=K1 (identified)
> Kleibergen-Paap rk LM statistic          Chi-sq(90)=.       
> P-val=     .
> 
> Weak identification test
> Ho: equation is weakly identified
> Cragg-Donald Wald F statistic                                 
>       2.99
> Kleibergen-Paap Wald rk F statistic                           
>          .
> .....
> 
> Weak-instrument-robust inference
> Tests of joint significance of endogenous regressors B1 in 
> main equation
> Ho: B1=0 and orthogonality conditions are valid
> Anderson-Rubin Wald test           F(0,15)=           .     
> P-val=     .
> Anderson-Rubin Wald test           Chi-sq(0)=         .     
> P-val=     .
> Stock-Wright LM S statistic        Chi-sq(0)=         .     
> P-val=     .
> 
> ....
> 
> Number of clusters             N_clust  =         16
> Number of observations               N  =        176
> Number of regressors                 K  =         11
> Number of endogenous regressors      K1 =          1
> Number of instruments                L  =        100
> Number of excluded instruments       L1 =          0
> --------------------------------------------------------------
> --------------
> --------------------------------------------------------------
> --------------
> And this is the command and output with extra controls:
> 
> --------------------------------------------------------------
> --------------
> --------------------------------------------------------------
> --------------
> xi: xtivreg2 netpay  public pop age sex shar_ed2 shar_ed3 (share_reg =
> GWmean40_UK_* GWmean40_USA_* GWmean40_DE_* mean2004_40_UK_*
> mean2004_40_USA_* mean2004_40_DE_*) i.year if year>=1997&year<=2007,fe
> cluster(won) first
> 
> .......
> 
> F test of excluded instruments:
>   F( 90,    15) =    14.73
>   Prob > F      =   0.0000
> Angrist-Pischke multivariate F test of excluded instruments:
>   F( 90,    15) =    14.73
>   Prob > F      =   0.0000
> 
> Summary results for first-stage regressions
> -------------------------------------------
> 
>                                            (Underid)          
>   (Weak id)
> Variable     | F( 90,    15)  P-val | AP Chi-sq( 90) P-val | AP F( 90,
> 15)
> share_reg    |      14.73    0.0000 |     3535.83   0.0000 |  
>      14.73
> 
> NB: first-stage test statistics cluster-robust
> 
> .......
> 
> Underidentification test
> Ho: matrix of reduced form coefficients has rank=K1-1 
> (underidentified)
> Ha: matrix has rank=K1 (identified)
> Kleibergen-Paap rk LM statistic          Chi-sq(90)=.       
> P-val=     .
> 
> Weak identification test
> Ho: equation is weakly identified
> Cragg-Donald Wald F statistic                                 
>       2.53
> Kleibergen-Paap Wald rk F statistic                           
>          .
> 
> ......
> 
> Weak-instrument-robust inference
> Tests of joint significance of endogenous regressors B1 in 
> main equation
> Ho: B1=0 and orthogonality conditions are valid
> Anderson-Rubin Wald test           F(6,15)=       16.04     
> P-val=0.0000
> Anderson-Rubin Wald test           Chi-sq(6)=    256.57     
> P-val=0.0000
> Stock-Wright LM S statistic        Chi-sq(6)=         .     
> P-val=     .
> 
> NB: Underidentification, weak identification and 
> weak-identification-robust
>     test statistics cluster-robust
> 
> Number of clusters             N_clust  =         16
> Number of observations               N  =        176
> Number of regressors                 K  =         17
> Number of endogenous regressors      K1 =          1
> Number of instruments                L  =        106
> Number of excluded instruments       L1 =          6
> --------------------------------------------------------------
> --------------
> --------------------------------------------------------------
> --------------
> 
> 
> Thanks for your consideration.
> 
> Best regards,
> 
> Anna Rosso
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.

Heriot-Watt University is the Sunday Times
Scottish University of the Year 2011-2012



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: xtivreg2, clustered errors and F statistic
  - From: "Anna Rosso" <[email protected]>
Prev by Date: Re: st: Assistance with manipulating a social network dataset?
Next by Date: st: location-industry averages
Previous by thread: st: xtivreg2, clustered errors and F statistic
Next by thread: st: location-industry averages
Index(es):
- Date
- Thread