Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Clustered standard errors: Insufficient observations


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Clustered standard errors: Insufficient observations
Date   Sat, 14 May 2011 13:20:54 -0500

On Sat, May 14, 2011 at 12:30 PM, Jost Heckemeyer <Heckemeyer@zew.de> wrote:
> Dear Statalisters,
> I want to estimate a large cross-country panel model (> 50.000 firm year
> observations). As some of my main explanatory variables vary mainly at
> the country-level (e.g. tax rates) I cluster standard errors within
> countries, not within firms - as it is generally recommended to do.
>
> However, as soon as I estimate a firm fixed effects model (xtreg, fe
> with option cluster(country), fe are at the firm level) it does not work
> anymore and it just gives me the error "insufficient observations".
> xtreg, re and all pooled estimators all work well with cluster(country).
> xtreg, fe also works with clustering within firms. But this is not what
> I want. It would be great if anyone coule help me with this problem.
> What can I do?

Clustered standard errors are intended to work at the highest level of
your sampling, or the highest level at which you expect correlations
in the error terms (because of unobserved or omitted variables, say).
I'd be curious to see as to who "generally recommends" clustering at
the level at which the explanatory variables vary; if this were a good
claim, you would have to cluster on gender in any labor market
regression, leaving you with 2 d.f.s to estimate at most 1 parameter
besides the intercept. If you sampled 3 countries from a list of
developing countries, and then 10K firms within country, then you
would want to cluster at the level of the countries (although it won't
produce reasonable results, since you need at least several dozen
clusters to get sensible performance). If you had three countries
because that's where your collaborators have been, the country
dimension has nothing to do with sampling, and the legitimate sampling
units would be firms (unless of course you sampled industry codes
first, in which case you would need to cluster by the industry codes).
If you are concerned about lack of control over your country
dimension, you could  specify interactions of your explanatory
variables (may be a subset of them) with the country variable using
something like i.country#(c.employment i.ownership).

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index