Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: multicollinearity with survey data


From   Christine Gourin <cgourin1@jhmi.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: multicollinearity with survey data
Date   Tue, 22 Feb 2011 11:55:41 -0500

i have a question about how to check for multicollinearity with survey data. the only information I can find about this is at the site
http://www.stata.com/support/faqs/res/statalist.html#toask

I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable.
I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals.

I am not sure how to check for multicollinearity in the full model, which is


xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH  i.RACE i.comorbidity



when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly.



But when I check vif per the link attached it is not collinear.



have done so several ways:



1) testing just differing combinations of the independent variables: example,

xi: svy: regress  HOSP_TEACH elective
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

this gives output of
tolerance = .99708964 VIF = 1.0029189



2) testing the dependent variable with individual independent variables:

xi: svy: regress  HVH  HOSP_TEACH

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

 this gives output of

------------------------------------------------------------------------------
             |             Linearized
         HVH |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  HOSP_TEACH |   .2701522   .0414694     6.51   0.000      .188855    .3514494
       _cons |   1.52e-14          .        .       .            .           .
------------------------------------------------------------------------------

but  also tolerance = .90653199 VIF = 1.103105





3) running full regression of all independent variables only testing each first: example

xi: svy: regress HOSP_TEACH i.RACE i.comorbidity HVH elective age65 flap neckdissection i.procedure i.payor radiation

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))



i get tolerance = .95517604 VIF = 1.0469274



4) finally if I just run the full model and "display tolerance"



xi: svy: regress  HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity
display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))



HOSP_TEACH is not dropped and the tolerance = .87624609 VIF = 1.1412319



this suggests I should leave all variables in?



********************************



none of these steps suggest that HOSP_TEACH is collinear, though I am unclear which of these 4 approaches is the correct approach to use.





when I run my final model as a logistic regression:

xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH i.RACE i.comorbidity
svylogitgof



HOSP_TEACH is dropped.



which is the right step I should take to test multicollinearity?

and am I confusing collinearity with perfect prediction? should I drop HOSP_TEACH from my final model (which will give me more power, population-size wise)?



many thanks in advance



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index