Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: multicollinearity with survey data


From   rachel grant <[email protected]>
To   [email protected]
Subject   Re: st: multicollinearity with survey data
Date   Thu, 24 Feb 2011 18:43:35 +0000

Thanks for clarifying! Rachel

On 23 February 2011 22:20, Steven Samuels <[email protected]> wrote:
> Rachel,
>
> Your advice about collinearity is incorrect.
>
> 1. A test for zero correlation among predictors has no place in a study of collinearity. Natural correlation among predictors is expected.
>
> Perfectly collinear variables are those with a multiple correlation R-square of 1.0 when regressed on others; these are the types that tossed out by regression programs.  Rather than "test" for multicolinearity (and I shouldn't have used that phrase), the proper approach is to evaluate how bad it is.  The measures for doing so are the variance inflation factor (VIF) for each predictor, or equivalently, the multiple R for predicting that variable with the others.
>
> 2. Contrary to your belief, adding collinear variables can improve a model. Indeed if the goal is simply to get the best possible prediction of Y, then collinearity might be more or less irrelevant.
>
> The real problem caused by high multicollinearity is that it makes it difficult to interpret individual regression coefficients.  For a treatment see any text on multiple regression. It is impossible to give blanket advice about what to do if high collinearity is found. Certainly dropping the most collinear variable is one option; but what if that is a predictor of interest?  There is a large literature on this topic.
>
> Steve
>
> Steven J. Samuels
> Consulting Statistician
> 18 Cantine's Island
> Saugerties, NY 12477 USA
> Voice: 845-246-0774
> Fax:   206-202-4783
> [email protected]
>
>
>
>
>
>
>
> On Feb 23, 2011, at 5:30 AM, rachel grant wrote:
>
> I am not an expert on this so correct me if I am wrong Stata Listers!
> In my models (negative binomial regression) Stata automatically checks
> for multicollinearity and omits colinear variables and then tells you
> it has done so. Multicollinearity just means that variables are highly
> correlated with each other so if you want to test for it, run a simple
> correlation test. Including colinear variables adds no new info to the
> model. Ifyou have several variables that are highly correlated with
> each other, you only need use one of these in the model.
> Rachel
>
> Rachel Grant
> Dept. Life Sciences
> Open University
> UK
>
> On 23 February 2011 05:03, Christine Gourin <[email protected]> wrote:
>> thank you!
>> how do you test for collinearity with survey data, however?
>> ________________________________________
>> From: [email protected] [[email protected]] On Behalf Of Steven Samuels [[email protected]]
>> Sent: Tuesday, February 22, 2011 1:27 PM
>> To: [email protected]
>> Subject: Re: st: multicollinearity with survey data
>>
>>> On Feb 22, 2011, at 11:55 AM, Christine Gourin wrote:
>>>
>>> i have a question about how to check for multicollinearity with survey data. the only information I can find about this is at the site
>>> http://www.stata.com/support/faqs/res/statalist.html#toask
>>>
>>> I am using survey data to investigate variables associated with hospital volume (HVH) as the dependent variable.
>>> I suspect that teaching status (HOSP_TEACH) is collinear with HVH, as all HVH hospitals are teaching hospitals.
>>>
>>> I am not sure how to check for multicollinearity in the full model, which is
>>>
>>>
>>> xi: svy: logistic HVH elective i.agecat flap neckdissection i.procedure i.payor radiation HOSP_TEACH  i.RACE i.comorbidity
>>>
>>>
>>>
>>> when I run this model, stata drops HOSP_TEACH saying it predicts failure perfectly.
>>>
>>
>> This message has nothing to do with multicollinearity.  Multicollinearity concerns the correlations of predictors with each other. This message, refers to the association of outcome and one predictor.  Tabulating HVH against HOSP_TEACH should show you the problem.
>>
>>
>> Steve
>>
>> Steven J. Samuels
>> Consulting Statistician
>> 18 Cantine's Island
>> Saugerties, NY 12477 USA
>> Voice: 845-246-0774
>> Fax:   206-202-4783
>> [email protected]
>>
>>
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> regards, Rachel
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
regards, Rachel

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index