Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Clustering of Standard Errors in a fixed effect model.

 From Maarten buis To statalist@hsphsun2.harvard.edu Subject Re: st: Clustering of Standard Errors in a fixed effect model. Date Mon, 21 Jun 2010 06:31:21 -0700 (PDT)

```--- Austin Nichols wrote:
> > The number of clusters and how balanced they are determine the
> > and refs therein, and for the follow-up see
> > http://www.stata.com/meeting/boston10/abstracts.html#baum

--- On Mon, 21/6/10, natasha agarwal wrote:
> Thanks Austin. I have read this paper.
> On this note, does it mean that if I have 30 clusters with
> a very unbalanced cluster size like one cluster size being 2000
> observations and the other say 30 observations will give me
> inconsistent results?

I like to use simulations in order to get an idea of how big a
certain problem is for your data.

1) estimate the model of interest on your data
2) store the parameter of interest, for the purpose of this
simulation this will be regarded as the "population value".
3) use -bsample- (whith the -cluster()- option) to draw a
4) rerurn your model of interest on this "sample"
5) test whether your parameter of interest equals the
"popalation value"
6) store the p-value (and often it is also interesting to store
the parameter of interest)
7) repeat steps 3-6 many times, for example using -simulate-,
see -help simulate-.

The stored p-values should follow a uniform distribution. This means
that you will reject the true null hypothesis in 5% of the samples if
you choose a significance level of 5%, and in 10% of the samples if
you choose a siginificance level of 10%, etc. If the p-value does not
follow a uniform distribution then the nominal significance level and
the true rejection rates will not correspond. The logic of a statistical
test (at a 5% significance level) is that a statement is "trustworthy"
because it used a method that will wrongly reject the null-hypothesis in
only 5% of the times that that method is used. So if there are major
deviations between the nominal significance and true rejection rate
we undermine the logic behind the test. Large deviations form the
uniform distribution in the p-values correspond to large deviations in
the rejection rate compared to nominal significance levels.

It is often also informative to look at the "sampling distribution" of
the parameter of interest itself.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```