Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: cluster and F test


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: cluster and F test
Date   Sun, 6 Jul 2008 10:05:23 -0400

sara borelli <saraborelli77@yahoo.it> :
An individual SE may be OK, in the sense that a test involving only
one coef may have approximately the right size, but e(V) has rank M-1
and so the upper limit on the number of coefs that can included in one
joint test is M-1.  The reported SEs ignore the cov between the 37
estimates; they offer a test of one coef each, ignoring the fact that
you can't actually test all 37, or even 14, jointly.  But in this
case, a test of even one coef is suspect, because you have M-1=13
which is a very small number to consider close to infinity.  50
clusters, or at the very least 20 large balanced clusters, are needed
to be reasonably sure the size distortion is not too large.  In
general, it probably seems like a bad idea to include more variables
than you have effective df, though for the CRSE, Stata will let you do
it, for various reasons.  For example, if you had 50 clusters, 50
fixed effects for cluster and 120 fixed effects for time, you could
include these 170 effects as 168 dummy variables along with one
explanatory variable of interest.  You can never test the joint sig of
the cluster FE nor the joint sig of the time FE, and you will (one
hopes) not be testing smaller groups of these FE either, so the only
test you plan to do in this case is on the one explanatory variable of
interest, with 49 df.  In this case, you should be fine.  Note the
relevant number is M-k, number of clusters less number of constraints.

--Austin

On Sun, Jul 6, 2008 at 5:14 AM, sara borelli <saraborelli77@yahoo.it> wrote:
> Hi [Austin],
> thank you very much for your help.
>
> When I test (with the F-test) 18 restrictions with 14 clusters stata drops the 5 constraints because, as you said, it can test only 13 constraints.
> There is something I do not understand, however. With the cluster option the number of observations useful to estimate the standard errors becomes the number of clusters, 14. Thus, if I have 37  standard errors to estimate and only 14 clusters, how is that possible that stata is able to estimate all the standar errors, but still test only 13 constraints?
> Basically, when the number of clusters is smaller than the number of regressors, is only the F-test computed in a wrong way or also the standar errors?
> I am sorry to keep usking about this, but I ma a bit confused
> Thank you
> Sara
>
> --- Sab 5/7/08, Austin Nichols <austinnichols@gmail.com> ha scritto:
>
>> Da: Austin Nichols <austinnichols@gmail.com>
>> Oggetto: Re: st: cluster and F test
>> A: statalist@hsphsun2.harvard.edu
>> Data: Sabato 5 luglio 2008, 19:01
>> sara borelli <saraborelli77@yahoo.it>:
>> The cluster-robust standard error (CRSE) estimator has at
>> most M-1 df
>> with M clusters, so with 14 clusters you can test the joint
>> sig. of at
>> most 13 coefs. But the performance of the estimator gets
>> worse as you
>> increase the the number of constraints.  The CRSE's
>> performance
>> improves as M-k increases toward infinity, where M is the
>> number of
>> clusters and k the number of constraints you are testing,
>> and for M-k
>> at least 20 and clusters balanced you should expect good
>> performance.
>> Since you have M-k equal to one (the minimum possible
>> value), you
>> should expect that the estimated variance is too low and
>> the F stat is
>> too high, on average.  Note that clusters are like
>> super-observations,
>> for the purposes of the SE of estimated coefs, so a
>> regression on 37
>> variables with 14 clusters is a bit like a regression on 37
>> vars with
>> 14 obs--you really don't want to test more than one
>> coef there, and
>> maybe not even that many.  How are your clusters defined?
>> Is there
>> any possibility of adding more clusters, or redefining them
>> sensibly
>> so you have more clusters?
>>
>> On Fri, Jul 4, 2008 at 5:16 AM, sara borelli
>> <saraborelli77@yahoo.it> wrote:
>> > Dear Stata List members,
>> >
>> > I have found some related questions on FAQs, but I
>> cannot fins exactly what I need.
>> > I am running a regression with the cluster option. I
>> have 37 independent variables (including the constant),
>> 1647 observations, and 14 clusters.
>> > I want to test the joint significance of 18 variables.
>> > If I do NOT use the cluster option the F is calculated
>> correctly as F(18, 1637).
>> > But once I introduce the cluster option I get the
>> following result:
>> >  (1)  x1= 0
>> >  (2)  x2 = 0
>> >  (3)  x3 = 0
>> >  (4)  x3 = 0
>> >  ...
>> >  (18)  x18 = 0
>> >       Constraint 1 dropped
>> >       Constraint 2 dropped
>> >       Constraint 3 dropped
>> >       Constraint 4 dropped
>> >       Constraint 14 dropped
>> >
>> >       F( 13,    13) =  109.42
>> >            Prob > F =    0.0000
>> >
>> > I guess stata is doing something on the degree of
>> freedoms, but I have not clear what is going on, why it is
>> dropping the constraints. Is the final F test calculated
>> correct?
>> > Thank you in advance for any help
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index