Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: deriving the BIC when the vce(robust) option is used


From   mario fiorini <mariofiorini73@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: deriving the BIC when the vce(robust) option is used
Date   Thu, 29 Nov 2012 09:10:11 +1100

Dear statalist,
using Stata 11.2, I was trying to derive the Bayesian Information
Criterion (BIC) after a regression with the vce(robust) option, and
noted that the BIC is computed uisng the rank of e(V). However, the
rank of e(V) was lover than the number of coefficients. What I think
is happening is that I have a variable that is nonzero for only 1
observation in the estimation sample (I have a lot of dummy
variables). Stata is clear in what happens in this case. From Stata

"Is there a regressor that is nonzero for only 1 observation or for one cluster?

    The VCE you have just estimated is not of sufficient rank to perform the
    model test.  This can happen if there is a variable in your model that is
    nonzero for only 1 observation in the estimation sample.  Likewise, it
    can happen if a variable is nonzero for only one cluster when using the
    cluster-robust VCE.  In such cases the derivative of the sum-of-squares
    or likelihood function with respect to that variable's parameter is zero
    for all observations.  That implies that the outer-product-of-gradients
    (OPG) variance matrix is singular.  Because the OPG variance matrix is
    used in computing the robust variance matrix, the latter is therefore
    singular as well."

However, what surprised me was that the reported
1 - e(df_m) was not equal to the actual number of coefficients
2 - the BIC is determined using the rank of e(V) rather than the
actual number of coefficients

The code below replicates this situation, using in one case vce(ols)
[the default] and in the other vce(robust)

 * Start
clear
set obs 1000
ge id = _n
ge var2 = 0
replace var2=1 if id==1 // nonzero for only 1 observation
ge var3 = invnorm(uniform())
ge var4 = invnorm(uniform())

reg var3 var2 var4
ereturn list
estat ic

reg var3 var2 var4, vce(robust)
ereturn list
estat ic

 * Ends

The estimated coefficients are the same in both cases, while the
standard errors are not, as expected. However, note that the e(df_m)
and BIC are different depending on the vce option.
Is this correct? Shouldn't e(df_m) always report the actual number of
coefficients and the BIC be calculated accordingly?
Any clarification would be great.

Mario Fiorini
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index