```--- Alexandra Wilson <avwilson@bluebottle.com> wrote:
> I am running a panel regression with a dichotomous variable using
> xtlogit. I was getting strange (unexpected) results, and realized 2
> of my independent variables were highly correlated (correlation
> coefficient 0.92). So I omitted one and the results were much more
> in line with other tests.
>
> But in my list of independent variables I still have a variable for
> age (of panel subject) and a variable for the square of age.  These 2
> variables are, of course, also highly correlated.  So why is it
> correct to leave both these highly correlated variables in the
> regression, and yet to exclude the other highly correlated variable?

Good question. The way I think about problems due to highly correlated
independent variables is that those variables tend to be imperfect
measures of the same concept, so including both would mean you are
controlling a variable for itself. If the two variables are exactly the
same Stata will drop one. If the correlation is close to one, Stata
will do its best. However if the variables truly are two imperfect
measures of the same concept than the results won't make sense.

Age and age squared will measure something different, so if Stata can
estimate it, the results will make sense. Very high correlation may
cause some numerecial problems. In those cases it is possible to remove
the correlation between these variables using the -orthpoly- command,
see: -help orthpoly-.

Hope this helps,
Maarten

