[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: new feature in xtabond2
"David Roodman (email@example.com)" <DRoodman@cgdev.org>
st: new feature in xtabond2
Mon, 9 Jul 2012 17:49:58 +0000
Inspired by Jens Mehrhoff's "A Solution to the Problem of Too Many Instruments in Dynamic Panel Data GMM," which is a sort of reply to my "A Note on the Theme of Too Many Instruments," I've added a feature to xtabond2. It is meant to provide a minimally arbitrary way to limit the instrument count while minimizing loss of identifying information. It gives the user a way to control this trade-off.
You can now have xtabond2 apply principal components analysis (PCA) to the "GMM"-style instruments in order to produce a smaller instrument set that is maximally representative of the original. The option -pca- triggers this behavior. By default, xtabond2 will retain all principal components corresponding to eigenvalues of the correlation matrix that are greater than 1. You can override this default with the -components(#)- option. E..g. components(5) will cause it to retain the components with the 5 largest eigenvalues.
Technically, PCA is applied to de-meaned variables. But xtabond2 computes the retained components as linear combinations of the original instruments, not the de-meaned ones. This is because de-meaning subtracts a constant from an instrument. If the constant is itself an instrument, this does no harm. But it might not be, in which case the identifying assumptions would be materially changed.
With PCA, xtabond2 also reports two measures of how well the retained components do or could represent the original instrument set. One is the portion of the instruments' total variance explained by the retained components. It is the sum of the eigenvalues of the retained components divided by the sum of the eigenvalues of all the components (the latter being the same as the number of components/instruments). The other statistic is the Kaiser-Meyer-Olkin measure of sampling adequacy, which is the sum of all the instruments' squared correlations divided by that plus the sum of all their squared partial correlations. If the partial correlations are large--if the KMO statistic is low--it is hard to represent the span of the instruments with a subset of principal components and the PCA process destroys more identifying information. ("help pca postestimation##kmo" links to Stata documentation of this statistic.)
When using PCA in System GMM, I recommend doing "gmm(X, eq(diff)) gmm(X, eq(lev))" instead of "gmm(X)". As explained in "How to Do xtabond2" and in Arellano and Bover 1995, gmm(X) will generate the quadratically-numerous-in-T, "exploded" set of instruments based on X for the differenced equation, but only generate a linearly numerous set of instruments for the levels equation (those for the shortest lag depth used). It could generate the full, exploded set for the levels equation too, but the additional instruments would be mathematically redundant, containing no new identifying information. However, they would not be formally collinear with the normally retained instruments. And I think when doing PCA it is better to start with a symmetric instrument set, and let the PCA algorithm pare it down from there. "gmm(X, eq(diff)) gmm(X, eq(lev))" tricks xtabond2 into generating the full, exploded sets for both equations.
Thanks to Jens Mehrhoff for comments.
* For searches and help try: