[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: Correlation b/w independent variables in xtlogit
I view this issue as more about interpretability of the coefficient(s). You
don't really need to worry about highly correlated terms, like age and age
squared, if they are control variables in your model, but you would if either is
the primary effect of interest.
Models that include age and age squared are typically interested in controlling
for age and want to allow for some nonlinearity. The analyst is not usually
trying to interpret either of the coefficients, but is actually interested in
other coefficients in the model. On the other hand, if you have highly
correlated terms about the primary effects of interest, then collinearity can be
a significant problem -- you can't really measure the effect of either of the
correlated terms very well and the standard errors should show this. How you
should proceed depends on the subject matter -- you may want to think about
somehow combining the two terms or you *could" just drop one. If you drop a
term, you need to think about how to interpret the remaining term since the
coefficient now will include the effects of the dropped term as well.
----- Original Message -----
From: "Alexandra Wilson" <email@example.com>
Sent: Thursday, May 17, 2007 1:05 AM
Subject: st: Correlation b/w independent variables in xtlogit
I have a simple question: if the answer is well known to everyone but me,
apologies, but I am living in Tanzania where there is a dearth of
statisticians and stats books, and I have trawled the internet and the
statalist archives to no avail.
I am running a panel regression with a dichotomous variable using xtlogit.
I was getting strange (unexpected) results, and realized 2 of my independent
variables were highly correlated (correlation coefficient 0.92). So I
omitted one and the results were much more in line with other tests. But in
my list of independent variables I still have a variable for age (of panel
subject) and a variable for the square of age. These 2 variables are, of
course, also highly correlated. So why is it correct to leave both these
highly correlated variables in the regression, and yet to exclude the other
highly correlated variable?
Any enlightenment much appreciated.
* For searches and help try: