# Re: st: Multicollinearity problem in Logistic survival analysis

 From Maarten buis To statalist@hsphsun2.harvard.edu Subject Re: st: Multicollinearity problem in Logistic survival analysis Date Sat, 27 Mar 2010 09:25:25 +0000 (GMT)

```--- On Sat, 27/3/10, Lu, Zhenyan wrote:
> In my research I have 6 variables that are highly
> correlated, correlation value up to .71 to .83 based on
> large samples (n>140,000). <snip> So I am really
> concerned about the potential problem in the model.

By adding multiple explanatory variables you want to be
able to distinguish between them. If two variables are
*perfectly* correlated, how would you be able to distinguis
between the two? This is why Stata will drop variables
when there is perfect correlation. If two variables are
strongly but not perfectly correlated, then that means that
it will be more difficult for Stata (or any other statistical
software package) to distinguish the effects of the two
variables. This leads to higher standard errors, which is
exactly as it should be: It is more difficult to distinguish
the variables, so we are more uncertain about the results,
so the standard errors should be larger. In other words
there is no problem.

> And even more complicated is that I have to include square
> terms for each of these 6 variables in the model at the
> same time to test the curvilinear relationship.

Adding square terms is a very limited way of checking for
curvilinearity. I like the linear spline (see:
- help mkspline-) as good compromise between a flexible
non-linear curve and parameters with an easy interpretation.

Others like more smooth non-linear curves like restricted
cubic splines or fractional polynomials. If you want to
interpret the results of those curves you'll have to make
graphs.

For restricted cubic splines see:
http://www.stata.com/meeting/sweden09/se09_orsini.pdf
and
http://ideas.repec.org/p/boc/dsug09/04.html

For fractional polynomials see: -help fracpoly-.

Hope this helps,
Maarten

