Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: multicollinearity with survey data

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: multicollinearity with survey data Date Wed, 23 Feb 2011 17:20:55 -0500

```Rachel,

1. A test for zero correlation among predictors has no place in a study of collinearity. Natural correlation among predictors is expected.

Perfectly collinear variables are those with a multiple correlation R-square of 1.0 when regressed on others; these are the types that tossed out by regression programs.  Rather than "test" for multicolinearity (and I shouldn't have used that phrase), the proper approach is to evaluate how bad it is.  The measures for doing so are the variance inflation factor (VIF) for each predictor, or equivalently, the multiple R for predicting that variable with the others.

2. Contrary to your belief, adding collinear variables can improve a model. Indeed if the goal is simply to get the best possible prediction of Y, then collinearity might be more or less irrelevant.

The real problem caused by high multicollinearity is that it makes it difficult to interpret individual regression coefficients.  For a treatment see any text on multiple regression. It is impossible to give blanket advice about what to do if high collinearity is found. Certainly dropping the most collinear variable is one option; but what if that is a predictor of interest?  There is a large literature on this topic.

Steve

Steven J. Samuels
Consulting Statistician
18 Cantine's Island
Saugerties, NY 12477 USA
Voice: 845-246-0774
Fax:   206-202-4783
sjsamuels@gmail.com

On Feb 23, 2011, at 5:30 AM, rachel grant wrote:

I am not an expert on this so correct me if I am wrong Stata Listers!
In my models (negative binomial regression) Stata automatically checks
for multicollinearity and omits colinear variables and then tells you
it has done so. Multicollinearity just means that variables are highly
correlated with each other so if you want to test for it, run a simple
correlation test. Including colinear variables adds no new info to the
model. Ifyou have several variables that are highly correlated with
each other, you only need use one of these in the model.
Rachel

Rachel Grant
Dept. Life Sciences
Open University
UK

On 23 February 2011 05:03, Christine Gourin <cgourin1@jhmi.edu> wrote:
> thank you!
> how do you test for collinearity with survey data, however?
> ________________________________________
> From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Steven Samuels [sjsamuels@gmail.com]
> Sent: Tuesday, February 22, 2011 1:27 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: multicollinearity with survey data
>
>> On Feb 22, 2011, at 11:55 AM, Christine Gourin wrote:
>>
>> i have a question about how to check for multicollinearity with survey data. the only information I can find about this is at the site
>>
>> I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable.
>> I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals.
>>
>> I am not sure how to check for multicollinearity in the full model, which is
>>
>>
>> xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH  i.RACE i.comorbidity
>>
>>
>>
>> when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly.
>>
>
> This message has nothing to do with multicollinearity.  Multicollinearity concerns the correlations of predictors with each other. This message, refers to the association of outcome and one predictor.  Tabulating HVH against HOSP_TEACH should show you the problem.
>
>
> Steve
>
> Steven J. Samuels
> Consulting Statistician
> 18 Cantine's Island
> Saugerties, NY 12477 USA
> Voice: 845-246-0774
> Fax:   206-202-4783
> sjsamuels@gmail.com
>
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
regards, Rachel

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```