[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
kmacdonald@stata.com (Kristin MacDonald, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Wald test limit? |

Date |
Fri, 08 Feb 2008 17:42:52 -0600 |

Keith Dear <keith(dot)dear(at)anu(dot)edu(dot)au> is fitting negative binomial and Poisson models with 261 covariates and -cluster()- option. Keith's concern is that, even though the the Wald chi2 statistic is missing, the degrees of freedom reported by the commands -nbreg- and -poisson- are less than 261. We privately requested a copy of Keith's data so that we could identify the cause of the reduced degrees of freedom. For an overall model Wald chi2 test, the degrees of freedom that are typically reported correspond to the number of constraints that are being tested, i.e., the number of covariates in the model. Sometimes, however, it is not possible to simultaneously test that all of the coefficients are zero. In these cases, Stata reports a missing Wald chi2 test. The degrees of freedom that are reported correspond to the maximum number of constraints that could have been tested simultaneously. This number is linked to the rank of the variance-covariance matrix. In order to perform a simultaneous test on all the coefficients, we need the covariance matrix be of full rank. The following situations can cause the rank of the covariance matrix to be too small to perform the overall Wald chi2 test. 1. The -cluster()- option has been specified. When variances are adjusted for clustering, the rank of the variance-covariance matrix is limited by the number of clusters. The number of constraints that can be tested is at most c-1, where c is the number of clusters. Keith has 389 clusters, so this is not the cause of the missing test statistic and reduced degrees of freedom in his case. 2. Some of the regressors are sparse indicators. Keith only has a few indicator variables in his model, none of which could be considered sparse, so this is not the cause of the reduced rank of the variance-covariance matrix in his case either. 3. Correlation exists between the regressors. Throughout Stata, the -_rmcoll- command is used to check for multicollinearity and, if detected, drop variables from the model in order to correct the problem. However, when working on machine precision, we need to set a threshold at which to consider variables as collinear. It may happen that variables are highly correlated, although not enough to be dropped from the model. If this is the case, the variance-covariance matrix may not be of full rank. This actually turns out to be the problem with Keith's model. We re-fit Keith's model without the -cluster()- or -robust- option. Even in this case we can see the effects of collinearity; the rank of the covariance matrix is 260. Including the -robust- or -cluster()- option could potentially reduce the rank even further; reducing the number of constraints that can be simultaneously tested. --Kristin --Isabel kmacdonald(at)stata(dot)com mcanette(at)stata(dot)com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: [9.2] mkmat rownames** - Next by Date:
**st: movestay** - Previous by thread:
**st: Data Manipulation Question** - Next by thread:
**st: movestay** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |