# Re: st: cluster and F test

 From "Ángel Rodríguez Laso" To statalist@hsphsun2.harvard.edu Subject Re: st: cluster and F test Date Tue, 8 Jul 2008 11:07:53 +0200

```Following the discussion, I don´t understand very well how degrees of
freedom (number of clusters-number of strata) and the actual number of
observations are used in svy commands (which are related to cluster
regression). I say so because when I calculate the sample size needed
in a survey to get a proportion with a determined confidence level,
the number I get is the number of observations and not the number of
degrees of freedom. So I assume that the number of observations is
what conditions the standard error and then I don´t know what degrees
of freedom are used for.

Cheers,

Ángel Rodríguez

2008/7/7, sara borelli <saraborelli77@yahoo.it>:
> Austin,
>
> thank you very much for your help,
> Sara
>
> --- Dom 6/7/08, Austin Nichols <austinnichols@gmail.com> ha scritto:
>
> > Da: Austin Nichols <austinnichols@gmail.com>
> > Oggetto: Re: st: cluster and F test
> > A: statalist@hsphsun2.harvard.edu
> > Data: Domenica 6 luglio 2008, 16:05
> > sara borelli <saraborelli77@yahoo.it> :
> > An individual SE may be OK, in the sense that a test
> > involving only
> > one coef may have approximately the right size, but e(V)
> > has rank M-1
> > and so the upper limit on the number of coefs that can
> > included in one
> > joint test is M-1.  The reported SEs ignore the cov between
> > the 37
> > estimates; they offer a test of one coef each, ignoring the
> > fact that
> > you can't actually test all 37, or even 14, jointly.
> > But in this
> > case, a test of even one coef is suspect, because you have
> > M-1=13
> > which is a very small number to consider close to infinity.
> >  50
> > clusters, or at the very least 20 large balanced clusters,
> > are needed
> > to be reasonably sure the size distortion is not too large.
> >  In
> > general, it probably seems like a bad idea to include more
> > variables
> > than you have effective df, though for the CRSE, Stata will
> > let you do
> > it, for various reasons.  For example, if you had 50
> > clusters, 50
> > fixed effects for cluster and 120 fixed effects for time,
> > you could
> > include these 170 effects as 168 dummy variables along with
> > one
> > explanatory variable of interest.  You can never test the
> > joint sig of
> > the cluster FE nor the joint sig of the time FE, and you
> > will (one
> > hopes) not be testing smaller groups of these FE either, so
> > the only
> > test you plan to do in this case is on the one explanatory
> > variable of
> > interest, with 49 df.  In this case, you should be fine.
> > Note the
> > relevant number is M-k, number of clusters less number of
> > constraints.
> >
> > --Austin
> >
> > On Sun, Jul 6, 2008 at 5:14 AM, sara borelli
> > <saraborelli77@yahoo.it> wrote:
> > > Hi [Austin],
> > > thank you very much for your help.
> > >
> > > When I test (with the F-test) 18 restrictions with 14
> > clusters stata drops the 5 constraints because, as you
> > said, it can test only 13 constraints.
> > > There is something I do not understand, however. With
> > the cluster option the number of observations useful to
> > estimate the standard errors becomes the number of
> > clusters, 14. Thus, if I have 37  standard errors to
> > estimate and only 14 clusters, how is that possible that
> > stata is able to estimate all the standar errors, but still
> > test only 13 constraints?
> > > Basically, when the number of clusters is smaller than
> > the number of regressors, is only the F-test computed in a
> > wrong way or also the standar errors?
> > confused
> > > Thank you
> > > Sara
> > >
> > > --- Sab 5/7/08, Austin Nichols
> > <austinnichols@gmail.com> ha scritto:
> > >
> > >> Da: Austin Nichols <austinnichols@gmail.com>
> > >> Oggetto: Re: st: cluster and F test
> > >> A: statalist@hsphsun2.harvard.edu
> > >> Data: Sabato 5 luglio 2008, 19:01
> > >> sara borelli <saraborelli77@yahoo.it>:
> > >> The cluster-robust standard error (CRSE) estimator
> > has at
> > >> most M-1 df
> > >> with M clusters, so with 14 clusters you can test
> > the joint
> > >> sig. of at
> > >> most 13 coefs. But the performance of the
> > estimator gets
> > >> worse as you
> > >> increase the the number of constraints.  The
> > CRSE's
> > >> performance
> > >> improves as M-k increases toward infinity, where M
> > is the
> > >> number of
> > >> clusters and k the number of constraints you are
> > testing,
> > >> and for M-k
> > >> at least 20 and clusters balanced you should
> > expect good
> > >> performance.
> > >> Since you have M-k equal to one (the minimum
> > possible
> > >> value), you
> > >> should expect that the estimated variance is too
> > low and
> > >> the F stat is
> > >> too high, on average.  Note that clusters are like
> > >> super-observations,
> > >> for the purposes of the SE of estimated coefs, so
> > a
> > >> regression on 37
> > >> variables with 14 clusters is a bit like a
> > regression on 37
> > >> vars with
> > >> 14 obs--you really don't want to test more
> > than one
> > >> coef there, and
> > >> maybe not even that many.  How are your clusters
> > defined?
> > >> Is there
> > >> any possibility of adding more clusters, or
> > redefining them
> > >> sensibly
> > >> so you have more clusters?
> > >>
> > >> On Fri, Jul 4, 2008 at 5:16 AM, sara borelli
> > >> <saraborelli77@yahoo.it> wrote:
> > >> > Dear Stata List members,
> > >> >
> > >> > I have found some related questions on FAQs,
> > but I
> > >> cannot fins exactly what I need.
> > >> > I am running a regression with the cluster
> > option. I
> > >> have 37 independent variables (including the
> > constant),
> > >> 1647 observations, and 14 clusters.
> > >> > I want to test the joint significance of 18
> > variables.
> > >> > If I do NOT use the cluster option the F is
> > calculated
> > >> correctly as F(18, 1637).
> > >> > But once I introduce the cluster option I get
> > the
> > >> following result:
> > >> >  (1)  x1= 0
> > >> >  (2)  x2 = 0
> > >> >  (3)  x3 = 0
> > >> >  (4)  x3 = 0
> > >> >  ...
> > >> >  (18)  x18 = 0
> > >> >       Constraint 1 dropped
> > >> >       Constraint 2 dropped
> > >> >       Constraint 3 dropped
> > >> >       Constraint 4 dropped
> > >> >       Constraint 14 dropped
> > >> >
> > >> >       F( 13,    13) =  109.42
> > >> >            Prob > F =    0.0000
> > >> >
> > >> > I guess stata is doing something on the
> > degree of
> > >> freedoms, but I have not clear what is going on,
> > why it is
> > >> dropping the constraints. Is the final F test
> > calculated
> > >> correct?
> > >> > Thank you in advance for any help
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
>
>
>      Posta, news, sport, oroscopo: tutto in una sola pagina.
> www.yahoo.it/latuapagina
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```