Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: HUGE Wald test statistics of testing multiple coefficients in a fixed effect model with cluster option


From   "Austin Nichols" <[email protected]>
To   [email protected]
Subject   Re: st: HUGE Wald test statistics of testing multiple coefficients in a fixed effect model with cluster option
Date   Tue, 11 Mar 2008 10:30:22 -0400

J.J. <[email protected]>:
No, there is no rule of thumb, and no ref right on point.  But the
point is clear enough not to need a ref: you can just point out that
the cluster-robust SEs are way too small to be plausible, so you've
concluded you don't have enough clusters to estimate 300 parameters.
Mark Schaffer and I found that the quality of the cluster-robust SE
declines in the number of parameters tested--for one parameter it is
almost always good (assuming reasonably balanced large clusters) but
for 10 or 20 it starts to get pretty bad--but "good" and "bad" are
only relative and depend on the true (unknown) error structure.  With
more detail on your particular application, you may get better advice
from the list on picking a reasonable model for the error structure.

On Mon, Mar 10, 2008 at 6:26 PM, J.J. <[email protected]> wrote:
> Austin & Jay,
>
> Thanks for your quick response. I want to estimate these dummy
> variables rather than treat them as errors, so RE is not my choice. I
> do have other explanatory variables in the panel, and their
> contributions to the model (adjusted R-square) are bigger than the 300
> dummy variables. Austin, is there any rule of thumb or paper to cite
> in term of measuring whether my df is enough?
>
> Thanks.
>
> J. J.
>
>
> On Mon, Mar 10, 2008 at 4:26 PM, Austin Nichols <[email protected]> wrote:
> > J.J. <[email protected]>:
> >  If you had no explanatory variables other than dummies for each panel
> >  (the dummy variable form of fixed effects) the standard errors when
> >  clustering by panel would be zero, roughly (i.e. close to zero, by the
> >  standards of machine precision).  Your 300 dummy variables may have a
> >  similar problem as the 500 dummies implied by the FE, and it seems
> >  likely you do not have the effective df you need to test 300 coefs.
> >  Note -xtcltest- will tell you clustering is a problem here, since it
> >  is essentially comparing the cluster-robust SE (about zero) to the OIM
> >  SE (not zero) and saying they are different.  While clustering may
> >  well be a problem, you likely do not have the sample size (in number
> >  of clusters, now) to effectively use the cluster-robust SE
> >  calculations.  You may want to explore a parametric approach like
> >  -xtregar- or -xtgee- or explore one of the options in -xtivreg2- (see
> >  the help file for -ivreg2- or the related papers for some detail on SE
> >  options) from SSC.
> >
> >
> >
> >  On Mon, Mar 10, 2008 at 3:27 PM, J.J. <[email protected]> wrote:
> >  > Dear Statalisters,
> >  >
> >  > My question is related to Daniel Simon's question in this unanswered
> >  > post: http://www.stata.com/statalist/archive/2006-03/msg00024.html. I
> >  > estimate a linear fixed effect model (about 3000 observations and 500
> >  > groups) with around 300 dummy variables. With the cluster option, the
> >  > F-statistics of testing the joint significance of these dummy
> >  > variables becomes HUGE (100,000+) even many of these coefficients are
> >  > dropped in the test. When I just use the robust instead of the cluster
> >  > option, the Wald test produces reasonable F-statistics. My main
> >  > purpose is to test the joint significance of these dummy variables in
> >  > the fixed effect model, should I drop the cluster option?
> >  >
> >  > Given the suggestions by Johannes Schmieder, Mark Schaffer and Austin
> >  > Nichols in this post
> >  > (http://www.mata.dk/statalist/archive/2006-09/msg00782.html), I felt
> >  > it is absolutely necessary to use the cluster option with xtreg, fe.
> >  > However, Mark Schaffer and Austin Nichols
> >  > (http://repec.org/usug2007/crse.pdf) allude to the danger of testing
> >  > multiple coefficients after the cluster option. In their simulation,
> >  > the rejection rate increases to 1 as the number of coefficients
> >  > increases. I guess their results indicate that the Wald test in my
> >  > situation (cluster option and so many variables in the model) is not
> >  > valid. What should I do? Any suggestions will be highly appreciated.
> >  >
> >  > One solution is to test if I need the cluster option using the not yet
> >  > available xtcltest (Mark and Austion, when is this program
> >  > available?). If I do need the cluster option, the next option is to
> >  > get rid of some of the dummy variables.
> >  >
> >  > J. J.
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index