Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: multicollinearity with survey data

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: multicollinearity with survey data Date Wed, 23 Feb 2011 09:00:56 -0500

```--

On Feb 23, 2011, at 12:03 AM, Christine Gourin wrote:

> thank you!
> how do you test for collinearity with survey data, however?

Christine:

1. Variables can get dropped from any type of regression because of collinearity. However in logistic regression,  they can also get dropped if a category gives perfect prediction, as you found.

2. To measure collinearity,  you used e(r2) , the overall multiple R-square after  -svy: reg-.  However R-square says nothing about collinearity.  It quantifies association of the outcome with all the predictors. The r-squares used in the definitions of VIF come from the regressions of each predictor on the others.

3. The weights are the only part of the survey design that enter the estimation of the VIF.  Therefore, to test for collinearity with survey data,  run (non-survey) -regress- with a [pw] option; then use -estat vif-.

**********************************
sysuse auto, clear
reg mpg weight turn price [pw=length]
estat vif
*********************************

Steve

Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:   206-202-4783
sjsamuels@gmail.com

> On Feb 22, 2011, at 11:55 AM, Christine Gourin wrote:
>
> i have a question about how to check for multicollinearity with survey data. the only information I can find about this is at the site
>
> I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable.
> I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals.
>
> I am not sure how to check for multicollinearity in the full model, which is
>
>
> xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH  i.RACE i.comorbidity
>
>
>
> when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly.
>

This message has nothing to do with multicollinearity.  Multicollinearity concerns the correlations of predictors with each other. This message, refers to the association of outcome and one predictor.  Tabulating HVH against HOSP_TEACH should show you the problem.

Steve

Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:   206-202-4783
sjsamuels@gmail.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```