Dear Statalisters â
I do hate to trouble everyone with a general syntax query. I think I may have resolved my query thanks to the Statalist archives and the xtabond2 help file, however would be most grateful for confirmation of this by regular users of the -xtabond2- routine.
Our model is specified as:
y_i,t = constant + B_1*y_i,t-1 + B_2*x1_it + B_3*x2_i,t + B_4*x3_i,t + B_5*x4_i,t + err_t
where:
y is the dependent variable,
x1 is an endogenous variable,
and x2 x3 & x4 are exogenous control variables
The â€“xtabond2- command I assume would therefore be expressed as:
.xtabond2 y l.y x1 x2 x3 x4, gmm(l.y x1, lag (1 .)) iv(x2 x3 x4) robust twostep
I have five questions regarding this syntax:
1) Specification of Lags within the -gmm()- option:
In the above syntax, I have specified lag(1 .). Is this correct, or should it be lag(2 .)? Do I actually need to specify the length of the lag? This problem arises because I cannot reconcile Blundell & Bond (1998) System GMM with the Stata defaults. My understanding of the Blundell and Bond (1998) System GMM is that the instruments for the difference equation are the variables in the levels equation dated t-2 and earlier; and the instruments for the level equation are the first-differences of variables dated t-1 and earlier. However, according to my reading of the xtabond2 help file, -xtabond2- uses [lagged] "levels dated t-1 or earlier as instruments for the first-difference equations; and uses the contemporaneous first differences as instruments in the levels equations."
2) Appropriate Inclusion of Variables to Be used as Instruments using the â€“iv()- option:
I have included x2 x3 and x4, which I am treating as exogenous, in the â€“iv()- option. Is this correct, as they are not strictly instruments?
3) Interpretation of the Arellano-Bond AR(1) and AR(2) tests:
Having run the twostep System GMM, I believe that by construction there should be evidence of first order serial correlation. Is it problematic if, on occasion, there is no evidence of this? On the other hand, if 2nd order serial order correlation is detected, how should one deal with this? For instance, in time series one would simply try adding lags â€“ does this approach also apply to panel data?
4) Missing Observations Query:
I am using an unbalanced panel of data consisting of non-overlapping 5-year averages, each with a maximum of five observations (one for each of five non-overlapping five year periods). Occasionally, I am missing observation(s) other than at the beginning or the end of the sample period (i.e. periods 2, 3 and/or 4). Is this OK?
5) -tsset- query.
I have calculated the above averaged data on a spreadsheet and imported this to Stata. I have renamed each of the five-year periods as integers 1 through 5, and I have â€“tsset- as follows:
. tsset countryid period, g
panel variable: countryid, 1 to 68
time variable: period, 1 to 5
Since the time variable had to be an integer, this seemed to be the only way to get Stata to accept my five-year averaged data. Is this correct?
Any insight or comments on any, some, or all of the above points would be gratefully received.
Yours sincerely,
Robert Scott
****************************
Robert Scott, B.A., M.Phil
Ph.D. Student
Department of Economics
Adam Smith Building
University of Glasgow
GLASGOW G12 8RT
Scotland
United Kingdom
****************************
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/