Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: deriving the BIC when the vce(robust) option is used

 From mario fiorini To statalist@hsphsun2.harvard.edu Subject st: deriving the BIC when the vce(robust) option is used Date Thu, 29 Nov 2012 09:10:11 +1100

```Dear statalist,
using Stata 11.2, I was trying to derive the Bayesian Information
Criterion (BIC) after a regression with the vce(robust) option, and
noted that the BIC is computed uisng the rank of e(V). However, the
rank of e(V) was lover than the number of coefficients. What I think
is happening is that I have a variable that is nonzero for only 1
observation in the estimation sample (I have a lot of dummy
variables). Stata is clear in what happens in this case. From Stata

"Is there a regressor that is nonzero for only 1 observation or for one cluster?

The VCE you have just estimated is not of sufficient rank to perform the
model test.  This can happen if there is a variable in your model that is
nonzero for only 1 observation in the estimation sample.  Likewise, it
can happen if a variable is nonzero for only one cluster when using the
cluster-robust VCE.  In such cases the derivative of the sum-of-squares
or likelihood function with respect to that variable's parameter is zero
for all observations.  That implies that the outer-product-of-gradients
(OPG) variance matrix is singular.  Because the OPG variance matrix is
used in computing the robust variance matrix, the latter is therefore
singular as well."

However, what surprised me was that the reported
1 - e(df_m) was not equal to the actual number of coefficients
2 - the BIC is determined using the rank of e(V) rather than the
actual number of coefficients

The code below replicates this situation, using in one case vce(ols)
[the default] and in the other vce(robust)

* Start
clear
set obs 1000
ge id = _n
ge var2 = 0
replace var2=1 if id==1 // nonzero for only 1 observation
ge var3 = invnorm(uniform())
ge var4 = invnorm(uniform())

reg var3 var2 var4
ereturn list
estat ic

reg var3 var2 var4, vce(robust)
ereturn list
estat ic

* Ends

The estimated coefficients are the same in both cases, while the
standard errors are not, as expected. However, note that the e(df_m)
and BIC are different depending on the vce option.
Is this correct? Shouldn't e(df_m) always report the actual number of
coefficients and the BIC be calculated accordingly?
Any clarification would be great.

Mario Fiorini
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```