Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Christine Gourin <cgourin1@jhmi.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: multicollinearity with survey data |
Date | Wed, 23 Feb 2011 15:33:02 -0500 |
Thank you VERY much! -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Steven Samuels Sent: Wednesday, February 23, 2011 9:01 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: multicollinearity with survey data -- On Feb 23, 2011, at 12:03 AM, Christine Gourin wrote: > thank you! > how do you test for collinearity with survey data, however? Christine: 1. Variables can get dropped from any type of regression because of collinearity. However in logistic regression, they can also get dropped if a category gives perfect prediction, as you found. 2. To measure collinearity, you used e(r2) , the overall multiple R-square after -svy: reg-. However R-square says nothing about collinearity. It quantifies association of the outcome with all the predictors. The r-squares used in the definitions of VIF come from the regressions of each predictor on the others. 3. The weights are the only part of the survey design that enter the estimation of the VIF. Therefore, to test for collinearity with survey data, run (non-survey) -regress- with a [pw] option; then use -estat vif-. ********************************** sysuse auto, clear reg mpg weight turn price [pw=length] estat vif ********************************* Steve Steven J. Samuels Consulting Statistician 18 Cantine's Island Saugerties, NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 sjsamuels@gmail.com > On Feb 22, 2011, at 11:55 AM, Christine Gourin wrote: > > i have a question about how to check for multicollinearity with survey data. the only information I can find about this is at the site > http://www.stata.com/support/faqs/res/statalist.html#toask > > I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable. > I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals. > > I am not sure how to check for multicollinearity in the full model, which is > > > xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity > > > > when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly. > This message has nothing to do with multicollinearity. Multicollinearity concerns the correlations of predictors with each other. This message, refers to the association of outcome and one predictor. Tabulating HVH against HOSP_TEACH should show you the problem. Steve Steven J. Samuels Consulting Statistician 18 Cantine's Island Saugerties, NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 sjsamuels@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/