Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: xtoverid error: internal reestimation of eqn differs from original


From   "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: xtoverid error: internal reestimation of eqn differs from original
Date   Thu, 10 Jul 2008 12:39:56 +0100

Hewan,

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu 
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Hewan Belay
> Sent: 09 July 2008 22:05
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: xtoverid error: internal reestimation of eqn 
> differs from original
> 
> Dear Mark,
> 
> Interesting points, thanks for your insights. It seems quite 
> hard to get the -xtoverid, noisily- command to produce 
> intelligible variable names. I followed your advice and 
> created the lagged variables "by hand", and reran -xthtaylor- 
> using these variables, and the temporary variables of    
> -xtoverid, noi- still didn't resemble the vars in my model. 
> Then, to narrow things down, I generated the two endogenous 
> vars, which happen to be lagged variables, using very short 
> names for them, and still no luck. Specifically, the two 
> endog vars are L.rev_EXT and L.rev_IGF_3avg. I named them ex 
> and ig respectively, and reran. The endogenous var that's 
> being reclassified as exogenous still has a nondescript 
> temporary name __000019, as opposed to something resembling 
> "ex" or "ig". I'm suspecting the problem child is the 
> endogenous variable L.rev_IGF_3avg (ie a LDV), and not 
> L.rev_EXT. Because when I run the regression with only the 
> latter being endog, xtoverid works
>  fine. When I run it with only the former being endog, 
> xtoverid gives the error message we're talking about. 
> 
> If indeed L.rev_IGF_3avg is perfectly collinear with some 
> combination of the remaining variables, and if (as you're 
> suspecting) xthtaylor somehow ignores that fact, at least for 
> sure a standard OLS should drop one or more variables in the 
> presence of perfect collinearity, right?

Here's a guess...

Internally, the Hausman-Taylor estimator works by creating new variables
based on the existing ones.  Specificially, it will take a variable and
(depending on how you specify it) create a new variable that is a group
mean (constant within groups, different across groups) and/or another
new variable that is a mean-deviation (different within groups but the
group mean is zero).  Another new variable that can be created would be
a GLS transform combining the group means and demeans.

Some or all of the temporary variables that -xtoverid- is reporting are
these new variables.

The -ec2sls- estimator for -xtivreg- does the same sort of thing.

Now Stata's -xthtaylor- and -xtivreg2,ec2sls- do something a bit odd
with these variables.  Say you have a variable X that is a time-varying
exogenous regressor.  You transform X so that you have two new variables
X_M and X_DM (group mean and demeaned, respectively).  You then combine
X and X_DM to get the GLS transform, which would be X_GLS = (X -
theta*X_DM), where theta is a scalar if it's a balanced panel and a
vector if it's unbalanced.  In the transformed regression to get
-xthtaylor- results, X_GLS is an exogenous regressor.

If the panel is balanced, then either X_M OR X_DM is available as a new
excluded instrument, but NOT both.  This is because, in a balanced
panel, X_M and X_DM together are perfectly collinear with the regressor
X_GLS; the theta in the previous paragraph is a scalar.

If the panel is unbalanced, X_M and X_DM are not perfectly collinear
with X_GLS, because the theta is a vector, not a scalar.  You could use
both X_M and X_DM as excluded instruments and not have perfect
collinearity.  It might be almost collinear (if all the elements of the
theta vector are very similar to each other), but not exactly collinear.
But it wouldn't be a sensible thing to do, since the second instrument
would be adding very little to the first.  But which to use?  X_M or
X_DM?

Here is the odd bit: what Stata does internally is treat X_GLS as
another *endogenous* regressor, and uses *both* X_M and X_DM as
instruments.  It looks odd, but it makes sense: by adding 1 endogenous
regressor and 2 excluded instruments, the degree of overidentification
is going up by 1, which is right.

My guess is that you have an X_GLS that is, for some reason, collinear
with some of the excluded instruments.

A way to test this is to limit your estimation sample to a balanced
panel.  -xtoverid- checks for this and knows that the X_GLS variables
should be treated as exogenous regressors.  If you don't get an error
message, that's probably it.

NB: For anyone who read this far, I have a working version of xtivreg2
that generalizes to fixed effects, random effects, EC2SLS,
Hausman-Taylor, G2SLS, etc.  Here is the current syntax for replicating
a Hausman-Taylor estimation:

* H-T estimation
xthtaylor ln_w age age2 tenure hours black birth_yr grade,
/*
	*/	endog(tenure hours grade) constant(black birth_yr grade)
i(idcode)

* xtivreg3 estimation, balanced panel
xtivreg3 ln_w age age2 black birth_yr (tenure hours grade=),	/* Endog
vars in ()
	*/ ivdm( (=age age2 tenure hours) )
/* Excl IVs, demeaned
	*/ gls i(idcode) small
/* Apply GLS transform to
	*/
/* dep var & regressors */

* xtivreg3 estimation, unbalanced panel
xtivreg3 ln_w (age age2 tenure hours black birth_yr grade=),	/* Endog
vars in ()
	*/ ivm(  (=age age2 black birth_yr) )
/* Excl IVs, means
	*/ ivdm( (=age age2 tenure hours)   )
/* Excl IVs, demeaned
	*/ gls i(idcode) small
/* Apply GLS transform to
	*/
/* dep var & regressors */

Note how, in the unbalanced case, exogenous regressors get added to the
endog variable list as GLS transforms, and to the mean and demeaned
instruments list.

Someday I will get around to finishing this and releasing it....

Cheers,
Mark

Prof. Mark Schaffer
Director, CERT
Department of Economics
School of Management & Languages
Heriot-Watt University
Edinburgh EH14 4AS
tel +44-131-451-3494 / fax +44-131-451-3296
email: m.e.schaffer@hw.ac.uk
web: http://www.sml.hw.ac.uk/ecomes

> But when I do a 
> simple -regress- on all the variables in this model, nothing 
> drops out (I can send you the output on that). Does that not 
> rule out perfect collinearity being the cause of the xtoverid 
> problem? Is xtoverid sensitive to strong (but non-perfect) 
> collinearity?
> 
> Please see the output below for details on my above described 
> effort with the two endog vars renamed "ex" and "ig". I would 
> very much appreciate any further thoughts you may have on 
> this mystery.
> 
> Thanks,
> Hewan
> .. g ex = L.rev_EXT
> (265 missing values generated)
> 
> .. g ig = L.rev_IGF_3avg
> (440 missing values generated)
> 
> .. xthtaylor rev_IGF_3avg popurb_share popdens pop p0 rain_av 
> road_no literate rel_christ *akan	*ewe ex ig L.(ex
> > p_pers_act exp_NPR exp_cap_act) dumreg1-dumreg7 
> dumreg9-dumreg10, endog(ex ig) varying(ex ig	L.(exp_pers_act 
> > exp_NPR exp_cap_act))
> 
> Hausman-Taylor estimation                       Number of obs 
>      =       699
> Group variable: code                            Number of 
> groups   =       106
> 
> Obs per group: min =         5
> avg =       6.6
> max =         7
> 
> Random effects u_i ~ i.i.d.                     Wald chi2(24) 
>      =   5417.96
> Prob > chi2        =    0.0000
> 
> 
> rev_IGF_3avg       Coef.   Std. Err.      z    P>z     [95% 
> Conf. Interval]
> 
> TVexogenous
> exp_pers_act 
> L1.    .0788914   .0169636     4.65   0.000     .0456433    .1121394
> exp_NPR 
> L1.    .1992826   .0252538     7.89   0.000     .1497861     .248779
> exp_cap_act 
> L1.   -.0038548   .0113339    -0.34   0.734    -.0260688    .0183592
> TVendogenous 
> ex    .0068097   .0148884     0.46   0.647    -.0223711    .0359905
> ig    .6608472   .0284122    23.26   0.000     .6051602    .7165341
> TIexogenous  
> popurb_share    .0011771    .000573     2.05   0.040     
> .0000541    .0023002
> popdens   -.0001731   .0000585    -2.96   0.003    -.0002877  
>  -.0000584
> pop    .0001918   .0001302     1.47   0.141    -.0000633    .0004469
> p0   -.3915243   .1268809    -3.09   0.002    -.6402062   -.1428423
> rain_av    .0000586   .0000765     0.77   0.444    -.0000913  
>   .0002086
> road_no   -.0120262   .0741182    -0.16   0.871    -.1572952  
>   .1332429
> literate    .0035863   .0017501     2.05   0.040     .0001562 
>    .0070164
> rel_christ   -.0016295   .0012886    -1.26   0.206    
> -.0041552    .0008962
> ethn_akan   -.0008522   .0006921    -1.23   0.218    
> -.0022086    .0005043
> ethn_ewe    .0004666   .0008781     0.53   0.595    -.0012544 
>    .0021877
> dumreg1    .0913287   .0850729     1.07   0.283    -.0754111  
>   .2580684
> dumreg2   -.0432531   .0780054    -0.55   0.579    -.1961408  
>   .1096345
> dumreg3    .0388221   .0882678     0.44   0.660    -.1341796  
>   .2118237
> dumreg4   -.0929884   .0735551    -1.26   0.206    -.2371536  
>   .0511769
> dumreg5   -.0559623   .0740561    -0.76   0.450    -.2011097  
>   .0891851
> dumreg6    .0299387   .0740921     0.40   0.686    -.1152792  
>   .1751566
> dumreg7    -.044829   .0717926    -0.62   0.532    -.1855399  
>   .0958819
> dumreg9    .1245367    .059701     2.09   0.037     .0075249  
>   .2415485
> dumreg10    .1489092    .062665     2.38   0.017     .0260881 
>    .2717303
>              
> _cons    .5605179   .2319658     2.42   0.016     .1058733    1.015162
> 
> sigma_u   .01675734
> sigma_e   .21148494
> rho   .00623926   (fraction of variance due to u_i)
> 
> Note:  TV refers to time varying; TI refers to time invariant.
> 
> .. xtoverid, noisily
> Warning - endogenous variable(s) collinear with instruments 
> Vars now exogenous: __000019
> 
> Unable to display summary of first-stage estimates; macro 
> e(first) is missing
> 
> IV (2SLS) estimation
> 
> 
> Estimates efficient for homoskedasticity only Statistics 
> consistent for homoskedasticity only
> 
> Number of obs =      699
> F( 25,   674) = 34571.80
> Prob > F      =   0.0000
> Total (centered) SS     =  292.3090099                
> Centered R2   =   0.8935
> Total (uncentered) SS   =  39947.58709                
> Uncentered R2 =   0.9992
> Residual SS             =  31.11655374                Root 
> MSE      =    .2149
> 
> 
> __00000I       Coef.   Std. Err.      t    P>t     [95% Conf. 
> Interval]
> 
> __00000M    .0788883   .0169636     4.65   0.000     .0455804 
>    .1121962
> __00000P    .1992816   .0252538     7.89   0.000      .149696 
>    .2488672
> __00000S   -.0038579   .0113339    -0.34   0.734    -.0261119 
>     .018396
> __00000V     .006814   .0148884     0.46   0.647    -.0224193 
>    .0360473
> __00000Y     .660834   .0284121    23.26   0.000     .6050471 
>    .7166208
> __00000Z    .0011771    .000573     2.05   0.040      .000052 
>    .0023022
> __000010    -.000173   .0000585    -2.96   0.003    -.0002879 
>   -.0000582
> __000011    .0001918   .0001302     1.47   0.141    -.0000638 
>    .0004474
> __000012   -.3915953   .1268798    -3.09   0.002    -.6407224 
>   -.1424682
> __000013    .0000586   .0000765     0.77   0.444    -.0000916 
>    .0002088
> __000014   -.0120329   .0741184    -0.16   0.871    -.1575636 
>    .1334978
> __000015    .0035862   .0017501     2.05   0.041     .0001499 
>    .0070224
> __000016   -.0016296   .0012886    -1.26   0.206    -.0041598 
>    .0009007
> __000017   -.0008521   .0006921    -1.23   0.219     -.002211 
>    .0005068
> __000018    .0004667   .0008781     0.53   0.595    -.0012575 
>    .0021908
> __00001A   -.0432515   .0780055    -0.55   0.579    -.1964146 
>    .1099115
> __00001B      .03883   .0882679     0.44   0.660    -.1344831 
>    .2121432
> __00001C   -.0929823   .0735552    -1.26   0.207    -.2374072 
>    .0514425
> __00001D   -.0559592   .0740563    -0.76   0.450     -.201368 
>    .0894496
> __00001E    .0299385   .0740923     0.40   0.686    -.1155409 
>     .175418
> __00001F   -.0448234   .0717927    -0.62   0.533    -.1857877 
>    .0961408
> __00001G     .124552    .059701     2.09   0.037     .0073298 
>    .2417743
> __00001H    .1489231    .062665     2.38   0.018      .025881 
>    .2719652
> __00000H    .5606982   .2319614     2.42   0.016     .1052443 
>    1.016152
> __000019    .0913319    .085073     1.07   0.283    -.0757082 
>    .2583719
> 
> Sargan statistic (overidentification test of all 
> instruments):           9.249
> Chi-sq(4) P-val =    0.0552
> 
> Instrumented:         __00000M __00000P __00000S __00000V 
> __00000Y __00000Z
> __000010 __000011 __000012 __000013 __000014 __000015
> __000016 __000017 __000018 __000019 __00001A __00001B 
> __00001C __00001D __00001E __00001F __00001G __00001H 
> Included instruments: __00000H Excluded instruments: __00000L 
> __00000O __00000R __00000U __00000X __00000K __00000N 
> __00000Q popurb_share popdens pop p0 rain_av road_no literate 
> rel_christ ethn_akan ethn_ewe dumreg1
> dumreg2 dumreg3 dumreg4 dumreg5 dumreg6 dumreg7 dumreg9 
> dumreg10 Reclassified as exog: __000019
> 
> xtoverid error: internal reestimation of eqn differs from 
> original r(198);
> 
> end of do-file
> 
> r(198);
> 
> 
>       
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index