Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Wald test limit?


From   kmacdonald@stata.com (Kristin MacDonald, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Wald test limit?
Date   Fri, 08 Feb 2008 17:42:52 -0600

Keith Dear <keith(dot)dear(at)anu(dot)edu(dot)au> is fitting negative binomial
and Poisson models with 261 covariates and -cluster()- option. Keith's concern
is that, even though the the Wald chi2 statistic is missing, the degrees of
freedom reported by the commands -nbreg- and -poisson- are less than 261.  We
privately requested a copy of Keith's data so that we could identify the cause
of the reduced degrees of freedom.

For an overall model Wald chi2 test, the degrees of freedom that are typically
reported correspond to the number of constraints that are being tested, i.e.,
the number of covariates in the model.  Sometimes, however, it is not possible
to simultaneously test that all of the coefficients are zero.  In these cases,
Stata reports a missing Wald chi2 test.  The degrees of freedom that are
reported correspond to the maximum number of constraints that could have been
tested simultaneously.  This number is linked to the rank of the
variance-covariance matrix. 

In order to perform a simultaneous test on all the coefficients, we need the
covariance matrix be of full rank. The following  situations  can cause the
rank of the covariance matrix to be too small to perform the overall Wald chi2
test.   

    1.  The -cluster()- option has been specified.  When variances are
	adjusted for clustering, the rank of the variance-covariance matrix is
	limited by the number of clusters.  The number of constraints that can
	be tested is at most c-1, where c is the number of clusters.  Keith
        has 389 clusters, so this is not the cause of the missing test
        statistic and reduced degrees of freedom in his case. 

    2.  Some of the regressors are sparse indicators.  Keith only has a few
	indicator variables in his model, none of which could be considered
        sparse, so this is not the cause of the reduced rank of the
        variance-covariance matrix in his case either. 

    3.  Correlation exists between the regressors.  Throughout Stata, the
	-_rmcoll- command is used to check for multicollinearity and, if
	detected, drop variables from the model in order to correct the
	problem.  However, when working on machine precision, we need to set a
	threshold at which to consider variables as collinear.  It may happen
	that variables are highly correlated, although not enough to be
	dropped from the model.  If this is the case, the variance-covariance
	matrix may not be of full rank.  This actually turns out to be the
	problem with Keith's model. 

We re-fit Keith's model without the -cluster()- or -robust- option.  Even in
this case we can see the effects of collinearity; the rank of the covariance
matrix is 260. Including the -robust- or -cluster()- option could potentially
reduce the rank even further; reducing the number of constraints that can be
simultaneously tested.

--Kristin                        --Isabel
kmacdonald(at)stata(dot)com      mcanette(at)stata(dot)com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index