Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Robust variances

From   Mark Schaffer <>
To, Constantine Daskalakis <>
Subject   Re: st: Robust variances
Date   Sat, 24 May 2003 00:44:22 +0100 (BST)


Quoting Constantine Daskalakis <>:

> At 04:48 PM 5/23/03, Mark Schaffer wrote:
> >Hi everybody.  With respect to clustering,
> >
> >Personally, I think this problem is VERY easy to stumble into
> [decoded =
> >I've done it myself] and could do with much more highlighting in
> the
> >manuals, and in the on-line help and error messages.
> >
> >--Mark
> I am confused.
> Consider the simplest case where I compute a robust variance from
> all the 
> data (ie, a single cluster). Are you saying that I get information
> equivalent to an observation of 1?

If you have a single cluster and you treat each observation within the 
cluster as independent, then the Omega_hat matrix used in the matrix 
product 1/N * X' * Omega_hat * X has a diagonal of squared residuals.  Each 
observation contributes a different residual.  This would be the standard 
Eicker-Huber-White-"sandwich" robust (but not cluster-robust) covariance 
estimator.  No problem here.

This isn't what Stata does for cluster-robust SEs.  The point of using 
cluster-robust SEs is to relax the independence assumption: the estimate of 
the var-cov matrix is robust to arbitrary intra-cluster correlation.  
Observations within a cluster can be correlated (or not) in any fashion, it 
doesn't matter so long as you assume that observations across clusters are 
independent.  To get the cluster-robust SEs, Stata aggregates clusters to 
get the "super-observations" to which I referred in my previous email.  The 
Omega_hat matrix in this case is block diagonal - each block consists of 
the "super-observation" contributed by a cluster, and the off-diagonal 
blocks are zeros because of the independence-across-clusters-assumption.

Now if you have a single cluster, you aggregate and you get a single super-
observation.  The rank of the cluster-robust var-cov matrix will be one 
(see help j_robustsingular from within Stata for more on this) and of 
course inference will be impossible.

It makes sense if you think about it.  Standard errors that are robust to 
arbitrary intra-cluster correlation means that *any* correlation between 
observations within a cluster is OK, and the SEs will still be consistent.  
No scheme for estimating the variance-covariance matrix will get you this 
if you have only one cluster!  This is equivalent to asking for consistent 
SEs without imposing any structure on the variance-covariance matrix at 
all.  You have N observations, but N*(N-1)/2 correlations and there's no 
way you can get an estimate of that.

Minor point in passing - the Stata manuals refer to Rogers (1993) as the 
source for the cluster-robust approach (I think he used to work at Stata 
Corp) but as far as I can tell, Hal White should get the credit - it's 
described in his 1984 book Asymptotic Theory for Econometricians.


> That is certainly not the case.
> Only if 
> the actual correlation is 1, I will get an "effective" N of 1. If
> the 
> actual correlation is 0, I will get an "effective" N similar to my
> original 
> observations. In this simple case, are you then saying that the
> robust 
> variance is nonsense? The Liang and Zeger GEE approach does exactly
> that 
> and it's been shown to be consistent in lots of situations, so your
> point 
> must be different.
> Maybe you're arguing that, with a single cluster, the robust
> variance is 
> fine, but when you sum across clusters, then you have to have a
> "large 
> number" of clusters?
> Any comments/direction from the good Stata people on this?
> cd
> The documents accompanying this transmission may contain
> confidential 
> health or business information. This information is intended for the
> use of 
> the individual or entity named above. If you have received this
> information 
> in error, please notify the sender immediately and arrange for the
> return 
> or destruction of these documents.
> ________________________________________________________________
> Constantine Daskalakis, ScD
> Assistant Professor,
> Biostatistics Section, Thomas Jefferson University,
> 125 S. 9th St. #402, Philadelphia, PA 19107
>     Tel: 215-955-5695
>     Fax: 215-503-3804
>     Email:
>     Webpage:
> *
> *   For searches and help try:
> *
> *
> *

Prof. Mark Schaffer
Director, CERT
Department of Economics
School of Management & Languages
Heriot-Watt University, Edinburgh EH14 4AS
tel +44-131-451-3494 / fax +44-131-451-3008


This e-mail and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to
whom it is addressed.  If you are not the intended recipient
you are prohibited from using any of the information contained
in this e-mail.  In such a case, please destroy all copies in
your possession and notify the sender by reply e-mail.  Heriot
Watt University does not accept liability or responsibility
for changes made to this e-mail after it was sent, or for
viruses transmitted through this e-mail.  Opinions, comments,
conclusions and other information in this e-mail that do not
relate to the official business of Heriot Watt University are
not endorsed by it.
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index