Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Austin Nichols <austinnichols@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: Re: st: Direction of the effect of the cluster command on the standard error depends on the inclusion of a control variable |
Date | Thu, 6 Jan 2011 08:59:29 -0500 |
Kit- Useful to think of super-obs but not quite right. If you have 50 clusters and 100 regressors (with a few thousand obs) but you are only interested in testing one coefficient, you will typically be fine, i.e. you will have negligible bias in the SE thus getting correct inference on average with the CRSE, and it may often be the case that no alternative approach gets you correct inference (except resampling clusters for a cluster-robust bootstrap). So estimating a regression with 50 obs and 100 coefficients is not quite the right analogy--more useful to think of the "effective" sample size as between M (number of clusters) and N (number of obs), computable using "roh" per Kish, L. (1965), Survey Sampling, New York: Wiley (note that the CRSE is also the standard svy estimator). On Thu, Jan 6, 2011 at 8:20 AM, Christopher Baum <kit.baum@bc.edu> wrote: > <> > On Jan 6, 2011, at 2:33 AM, Stas wrote: > >> >> There are terrible small sample biases exhibited by -robust- and >> - -cluster()- standard errors with small # of observations and clusters, >> respectively. As was noted by Justina, four clusters is SO far away >> from asymptotics that I wouldn't even consider the clustered standard >> errors in your situation. > > Just to add one thing to Stas', Justina's and Austin's replies... It is useful to think of the cluster-robust VCE estimator generating 'super-observations' , one per cluster. Thus with 4 clusters, you essentially are estimating a model with N=4 to compute the VCE. Some official Stata commands will let you do that, even when the number of coefficients > N. Baum-Schaffer-Stillman -ivreg2- (on SSC) will flag that as a problem, as it does not make much sense to do so. But one of the reasons that a small number of clusters may yield horrible results is that it represents estimation with a very small sample. > > Kit * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/