[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Wald test limit?
email@example.com (Kristin MacDonald, StataCorp LP)
Re: st: Wald test limit?
Fri, 08 Feb 2008 17:42:52 -0600
Keith Dear <keith(dot)dear(at)anu(dot)edu(dot)au> is fitting negative binomial
and Poisson models with 261 covariates and -cluster()- option. Keith's concern
is that, even though the the Wald chi2 statistic is missing, the degrees of
freedom reported by the commands -nbreg- and -poisson- are less than 261. We
privately requested a copy of Keith's data so that we could identify the cause
of the reduced degrees of freedom.
For an overall model Wald chi2 test, the degrees of freedom that are typically
reported correspond to the number of constraints that are being tested, i.e.,
the number of covariates in the model. Sometimes, however, it is not possible
to simultaneously test that all of the coefficients are zero. In these cases,
Stata reports a missing Wald chi2 test. The degrees of freedom that are
reported correspond to the maximum number of constraints that could have been
tested simultaneously. This number is linked to the rank of the
In order to perform a simultaneous test on all the coefficients, we need the
covariance matrix be of full rank. The following situations can cause the
rank of the covariance matrix to be too small to perform the overall Wald chi2
1. The -cluster()- option has been specified. When variances are
adjusted for clustering, the rank of the variance-covariance matrix is
limited by the number of clusters. The number of constraints that can
be tested is at most c-1, where c is the number of clusters. Keith
has 389 clusters, so this is not the cause of the missing test
statistic and reduced degrees of freedom in his case.
2. Some of the regressors are sparse indicators. Keith only has a few
indicator variables in his model, none of which could be considered
sparse, so this is not the cause of the reduced rank of the
variance-covariance matrix in his case either.
3. Correlation exists between the regressors. Throughout Stata, the
-_rmcoll- command is used to check for multicollinearity and, if
detected, drop variables from the model in order to correct the
problem. However, when working on machine precision, we need to set a
threshold at which to consider variables as collinear. It may happen
that variables are highly correlated, although not enough to be
dropped from the model. If this is the case, the variance-covariance
matrix may not be of full rank. This actually turns out to be the
problem with Keith's model.
We re-fit Keith's model without the -cluster()- or -robust- option. Even in
this case we can see the effects of collinearity; the rank of the covariance
matrix is 260. Including the -robust- or -cluster()- option could potentially
reduce the rank even further; reducing the number of constraints that can be
* For searches and help try: