Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Correlation b/w independent variables in xtlogit

From   "Michael Blasnik" <>
To   <>
Subject   st: Re: Correlation b/w independent variables in xtlogit
Date   Thu, 17 May 2007 09:54:48 -0400


I view this issue as more about interpretability of the coefficient(s). You don't really need to worry about highly correlated terms, like age and age squared, if they are control variables in your model, but you would if either is the primary effect of interest.

Models that include age and age squared are typically interested in controlling for age and want to allow for some nonlinearity. The analyst is not usually trying to interpret either of the coefficients, but is actually interested in other coefficients in the model. On the other hand, if you have highly correlated terms about the primary effects of interest, then collinearity can be a significant problem -- you can't really measure the effect of either of the correlated terms very well and the standard errors should show this. How you should proceed depends on the subject matter -- you may want to think about somehow combining the two terms or you *could" just drop one. If you drop a term, you need to think about how to interpret the remaining term since the coefficient now will include the effects of the dropped term as well.

Michael Blasnik

----- Original Message ----- From: "Alexandra Wilson" <>
To: <>
Sent: Thursday, May 17, 2007 1:05 AM
Subject: st: Correlation b/w independent variables in xtlogit

Dear Statalisters.
I have a simple question: if the answer is well known to everyone but me,
apologies, but I am living in Tanzania where there is a dearth of
statisticians and stats books, and I have trawled the internet and the
statalist archives to no avail.
I am running a panel regression with a dichotomous variable using xtlogit.
I was getting strange (unexpected) results, and realized 2 of my independent
variables were highly correlated (correlation coefficient 0.92).  So I
omitted one and the results were much more in line with other tests.  But in
my list of independent variables I still have a variable for age (of panel
subject) and a variable for the square of age.  These 2 variables are, of
course, also highly correlated.  So why is it correct to leave both these
highly correlated variables in the regression, and yet to exclude the other
highly correlated variable?
Any enlightenment much appreciated.
Alexandra Wilson
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index