From
mario fiorini <mariofiorini73@gmail.com>

To
statalist@hsphsun2.harvard.edu

Subject
st: deriving the BIC when the vce(robust) option is used

Date
Thu, 29 Nov 2012 09:10:11 +1100

Dear statalist, using Stata 11.2, I was trying to derive the Bayesian Information Criterion (BIC) after a regression with the vce(robust) option, and noted that the BIC is computed uisng the rank of e(V). However, the rank of e(V) was lover than the number of coefficients. What I think is happening is that I have a variable that is nonzero for only 1 observation in the estimation sample (I have a lot of dummy variables). Stata is clear in what happens in this case. From Stata "Is there a regressor that is nonzero for only 1 observation or for one cluster? The VCE you have just estimated is not of sufficient rank to perform the model test. This can happen if there is a variable in your model that is nonzero for only 1 observation in the estimation sample. Likewise, it can happen if a variable is nonzero for only one cluster when using the cluster-robust VCE. In such cases the derivative of the sum-of-squares or likelihood function with respect to that variable's parameter is zero for all observations. That implies that the outer-product-of-gradients (OPG) variance matrix is singular. Because the OPG variance matrix is used in computing the robust variance matrix, the latter is therefore singular as well." However, what surprised me was that the reported 1 - e(df_m) was not equal to the actual number of coefficients 2 - the BIC is determined using the rank of e(V) rather than the actual number of coefficients The code below replicates this situation, using in one case vce(ols) [the default] and in the other vce(robust) * Start clear set obs 1000 ge id = _n ge var2 = 0 replace var2=1 if id==1 // nonzero for only 1 observation ge var3 = invnorm(uniform()) ge var4 = invnorm(uniform()) reg var3 var2 var4 ereturn list estat ic reg var3 var2 var4, vce(robust) ereturn list estat ic * Ends The estimated coefficients are the same in both cases, while the standard errors are not, as expected. However, note that the e(df_m) and BIC are different depending on the vce option. Is this correct? Shouldn't e(df_m) always report the actual number of coefficients and the BIC be calculated accordingly? Any clarification would be great. Mario Fiorini * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

