Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Small sample with clustered data

From   Austin Nichols <>
Subject   Re: st: Small sample with clustered data
Date   Tue, 29 Nov 2011 06:19:24 -0500

Lars <>:
You can estimate the bias in the SE via simulation of data just like
yours where you control the correlations and actual treatment effects;
if the rejection rate of a nominal 1% test is in the 5% range for some
coefficients, and tests for other coefficients (vars with no
clustering) have correct size, perhaps you just use a higher standard
of "significance" for some coefs than others.  You will have to run
millions or at least hundreds of thousands of simulations, though,
which will take some time...  faster to just caveat "significance of
results must interpreted with caution" with OIM, het-robust, or
cluster-robust SEs.

On Mon, Nov 28, 2011 at 4:14 PM,  <> wrote:
> Dear Austin,
> thank you for your reply. If I understand you correct,
> you suggest to use cluster(countryid) after the regression, while
> controlling for euclus. Countryid is a number from 1 to 50. This works.
> The results are the same as if I use the robust option after the regression.
> So do you think this is the best option and I should state that SE are
> probably biased downward and thus significant results have to be interpreted with caution?
> What if the coefficients are still significant even though I do not use the cluster option? Is there a way
> to estimate the bias?
> Best
> Lars
> -----Ursprüngliche Nachricht-----
> Von: "Austin Nichols" <>
> Gesendet: 28.11.2011 20:00:41
> An:
> Betreff: Re: st: Small sample with clustered data
>>Lars <>:
>>You are likely to have SEs biased downward no matter what you do, if
>>you use the 24 cluster design--can you cluster by country (50
>>clusters) but include eucluster as an explanatory variable?
>>On Mon, Nov 28, 2011 at 6:24 AM, <> wrote:
>>> Dear Statalist,
>>> My sample consists of 50 countries with 26 of them being EU Member States.
>>> The problem is that the values of the dependent variable for the EU members are not
>>> independent of each other. Thus, I created a dummy variable "eucluster" that indicates
>>> if a country is in the EU (1=yes; 0=no) and used the cluster(eucluster) option after the
>>> OLS Regressions in Stata 10. However, in "Clustered Errors in Stata"
>>> (Nichols/Schaffer 2007 - it is mentioned that if M,
>>> the number of clusters, is small matters could even get worse by using the cluster option (Sheet 20).
>>> M=50 seems to be the minimum number of clusters required.
>>> I have 24 clusters consisting of 1 country and 1 cluster comprising 26 EU members (6 independent variables).
>>> I do not know how to deal "correctly" with these clustered data in Stata. Hence, I would highly appreciate if someone could
>>> give me advice or suggest a solution on how to deal with the clustered data in such a small sample.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index