Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: clustered standard errors

From   Robert Lineira <>
Subject   Re: st: RE: clustered standard errors
Date   Fri, 30 Apr 2010 17:08:39 +0200

Thanks a lot Steve for answering so quick.

I will try to implement your proposal!


Al 29/04/2010 20:07, En/na Steve Samuels ha escrit:

Are  any of the "local forces" or "national forces" identical for all
voters in a region?  For those forces that do not vary within a
region, your sample size is effectively n = 17.

It appears that regions are not "clusters", but are strata, which by
definition are units that together constitute the entire population of
interest. One approach to this  analysis, and the one I recommend in
absence of other information,  is  to treat regional differences as
fixed effects , and to use Stata's -svyset- command to specify the
design. The strata  would be the original strata in the 17 surveys,
suitably coded so that there are no duplicate numbers.  In the
analysis, you would have a fixed indicator of region, but also
regional variables that might explain the regional differences.

Although the samples in each region can be considered random,  they
are not "simple random samples". Pooling the samples without
adjustment for the original sample designs will give  biased estimates
(if the analysis is not weighted) and improper standard errors.


On Thu, Apr 29, 2010 at 11:46 AM, Robert Lineira<>  wrote:

The population are the 17 Spanish regions and the samples are post-election
surveys in each region. The purpose of the analysis is to look for variances
on the strength of local and national forces on voting and turnout.

Although the multi-stage sampling procedure takes advantage of some strata
and clusters to select the individuals, the samples may be considered as
random samples of voters in each region. The pool of samples consists in the
aggregation of this random samples.

I hope this helps in having a better idea of the research.

Thanks in advance!

Al 29/04/2010 14:06, En/na Steve Samuels ha escrit:
I wonder what the purpose of the analysis is, what the sampled
populations are, and what the sample designs are.  Survey samples can
be complex creations with their own strata and clusters. Until Robert
provides more detail, I'm not sure that  1 sample = 1 cluster.


On Thu, Apr 29, 2010 at 6:03 AM, Schaffer, Mark E<>


-----Original Message-----
[] On Behalf Of
Robert Lineira
Sent: 29 April 2010 10:08
Subject: st: clustered standard errors

Dear all,

I found on the net a presentation by Austin Nichols and Mark
Schaffer on the net on clustered standard errors. After
reading it, some questions emerged to me on how to use them.

I want to run an analysis using a pool of 17 survey samples.
Supposedly, standard errors will be correlated within the
clusters, but the presentation advises that to use clustered
standard error might be a very bad solution. They suggest to
perform some test before using the corrected errors running
'cltest' and 'xtcltest' stata commands.
Unfortunately, I just found 'cltest' command, I am not sure
is the same they use given that is previous to the Kédzi
(2007) paper they quote.

No, that's a different test.  The test code Austin and I referred to in
our presentation is still languishing in alpha testing.

But I'm not sure it or other tests can help you.

The problem is that this test, like White's general heteroskedasticity
test and related tests, works via a vector-of-contrasts.  The contrast is
between the elements of the robust and non-robust VCVs.

Under the null, the robust VCV is consistent.  If the non-robust VCV is
also consistent, its elements will be similar to those of the robust VCV,
and the vector of contrasts will be small.  If the non-robust VCV is
inconsistent, the contrast will be large.

You can see the problem now.  To do this or a related test in your
application, you need a robust VCV that is consistent.  Your cluster-robust
VCV is indeed consistent, but with only 17 clusters, you are not very far
along the way to infinity, and it's likely to be a poor estimator of the
VCV.  Contrasting it with the non-robust VCV is not going to give you a
reliable test - the contrast could be big because the cluster-robust VCV is
poor, for example.

Hope this helps.


My question is if anyone knows a test which I could use
before applying clustered standard errors and (if not) which
solution do you find better in a case such as this.



*   For searches and help try:

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.

*   For searches and help try:

Robert Liñeira
Dpt. Ciència Política - UAB
08193 Bellaterra - Barcelona
Tlf: +34 93 581 46 33
Despatx B1-185

*   For searches and help try:

Robert Liñeira
Dpt. Ciència Política - UAB
08193 Bellaterra - Barcelona
Tlf: +34 93 581 46 33
Despatx B1-185

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index