# st: multicollinearity with survey data

 From Christine Gourin
Date Tue, 22 Feb 2011 11:55:41 -0500

i have a question about how to check for multicollinearity with survey data.

I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable.
I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals.

I am not sure how to check for multicollinearity in the full model, which is

xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH  i.RACE i.comorbidity

when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly.

But when I check vif per the link attached it is not collinear.

have done so several ways:

1) testing just differing combinations of the independent variables: example,

xi: svy: regress  HOSP_TEACH elective
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

this gives output of
tolerance = .99708964 VIF = 1.0029189

2) testing the dependent variable with individual independent variables:

xi: svy: regress  HVH  HOSP_TEACH

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

this gives output of

------------------------------------------------------------------------------
|             Linearized
HVH |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
HOSP_TEACH |   .2701522   .0414694     6.51   0.000      .188855    .3514494
_cons |   1.52e-14          .        .       .            .           .
------------------------------------------------------------------------------

but  also tolerance = .90653199 VIF = 1.103105

3) running full regression of all independent variables only testing each first: example

xi: svy: regress HOSP_TEACH i.RACE i.comorbidity HVH elective age65 flap neckdissection i.procedure i.payor radiation

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

i get tolerance = .95517604 VIF = 1.0469274

4) finally if I just run the full model and "display tolerance"

xi: svy: regress  HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

HOSP_TEACH is not dropped and the tolerance = .87624609 VIF = 1.1412319

this suggests I should leave all variables in?

********************************

none of these steps suggest that HOSP_TEACH is collinear, though I am unclear which of these 4 approaches is the correct approach to use.

when I run my final model as a logistic regression:

xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity
svylogitgof

HOSP_TEACH is dropped.

which is the right step I should take to test multicollinearity?

and am I confusing collinearity with perfect prediction? should I drop HOSP_TEACH from my final model (which will give me more power, population-size wise)?

many thanks in advance

```