# Re: st: Robust variances

 From Mark Schaffer <[email protected]> To [email protected], Constantine Daskalakis <[email protected]> Subject Re: st: Robust variances Date Sat, 24 May 2003 00:44:22 +0100 (BST)

```Constantine,

> At 04:48 PM 5/23/03, Mark Schaffer wrote:
> >Hi everybody.  With respect to clustering,
> >
> >Personally, I think this problem is VERY easy to stumble into
> [decoded =
> >I've done it myself] and could do with much more highlighting in
> the
> >manuals, and in the on-line help and error messages.
> >
> >--Mark
>
> I am confused.
>
> Consider the simplest case where I compute a robust variance from
> all the
> data (ie, a single cluster). Are you saying that I get information
> equivalent to an observation of 1?

If you have a single cluster and you treat each observation within the
cluster as independent, then the Omega_hat matrix used in the matrix
product 1/N * X' * Omega_hat * X has a diagonal of squared residuals.  Each
observation contributes a different residual.  This would be the standard
Eicker-Huber-White-"sandwich" robust (but not cluster-robust) covariance
estimator.  No problem here.

This isn't what Stata does for cluster-robust SEs.  The point of using
cluster-robust SEs is to relax the independence assumption: the estimate of
the var-cov matrix is robust to arbitrary intra-cluster correlation.
Observations within a cluster can be correlated (or not) in any fashion, it
doesn't matter so long as you assume that observations across clusters are
independent.  To get the cluster-robust SEs, Stata aggregates clusters to
get the "super-observations" to which I referred in my previous email.  The
Omega_hat matrix in this case is block diagonal - each block consists of
the "super-observation" contributed by a cluster, and the off-diagonal
blocks are zeros because of the independence-across-clusters-assumption.

Now if you have a single cluster, you aggregate and you get a single super-
observation.  The rank of the cluster-robust var-cov matrix will be one
(see help j_robustsingular from within Stata for more on this) and of
course inference will be impossible.

It makes sense if you think about it.  Standard errors that are robust to
arbitrary intra-cluster correlation means that *any* correlation between
observations within a cluster is OK, and the SEs will still be consistent.
No scheme for estimating the variance-covariance matrix will get you this
if you have only one cluster!  This is equivalent to asking for consistent
SEs without imposing any structure on the variance-covariance matrix at
all.  You have N observations, but N*(N-1)/2 correlations and there's no
way you can get an estimate of that.

Minor point in passing - the Stata manuals refer to Rogers (1993) as the
source for the cluster-robust approach (I think he used to work at Stata
Corp) but as far as I can tell, Hal White should get the credit - it's
described in his 1984 book Asymptotic Theory for Econometricians.

--Mark

> That is certainly not the case.
> Only if
> the actual correlation is 1, I will get an "effective" N of 1. If
> the
> actual correlation is 0, I will get an "effective" N similar to my
> original
> observations. In this simple case, are you then saying that the
> robust
> variance is nonsense? The Liang and Zeger GEE approach does exactly
> that
> and it's been shown to be consistent in lots of situations, so your
> point
> must be different.
>
> Maybe you're arguing that, with a single cluster, the robust
> variance is
> fine, but when you sum across clusters, then you have to have a
> "large
> number" of clusters?
>
> Any comments/direction from the good Stata people on this?
>
> cd
>
>
>
>
>
> The documents accompanying this transmission may contain
> confidential
> health or business information. This information is intended for the
> use of
> the individual or entity named above. If you have received this
> information
> in error, please notify the sender immediately and arrange for the
> return
> or destruction of these documents.
> ________________________________________________________________
>
> Assistant Professor,
> Biostatistics Section, Thomas Jefferson University,
> 125 S. 9th St. #402, Philadelphia, PA 19107
>     Tel: 215-955-5695
>     Fax: 215-503-3804
>     Email: [email protected]
>     Webpage:
> http://www.kcc.tju.edu/Science/SharedFacilities/Biostatistics
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

Prof. Mark Schaffer
Director, CERT
Department of Economics
School of Management & Languages
Heriot-Watt University, Edinburgh EH14 4AS
tel +44-131-451-3494 / fax +44-131-451-3008
email: [email protected]
web: http://www.sml.hw.ac.uk/ecomes
________________________________________________________________

DISCLAIMER:

This e-mail and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to
whom it is addressed.  If you are not the intended recipient
you are prohibited from using any of the information contained
in this e-mail.  In such a case, please destroy all copies in
Watt University does not accept liability or responsibility
for changes made to this e-mail after it was sent, or for
viruses transmitted through this e-mail.  Opinions, comments,
conclusions and other information in this e-mail that do not
relate to the official business of Heriot Watt University are
not endorsed by it.
________________________________________________________________
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```