Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: collinear categorical variable identification


From   Buzz Burhans <wsb2@cornell.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: collinear categorical variable identification
Date   Thu, 12 Jun 2003 16:22:04 -0400

Thank you David, for your comments. For the most part I concur with the ideas you express. One clarification, regarding the following comment:

Just as we shouldn't be tempted to use

stepwise methods to formulate regression models, I don't think we should rely on
automated processes for diagnosing and solving problems of collinearity. Buzz
Burhans has indicated that "theoretical plausibility" is one of the criteria he
used. Aside from the estimated coefficients and standard errors (or CIs), which
alert us to the existence of the problem, I submit this is the only criterion
that should be used. (I assume that whatever procedure is followed, when dummy
variables are involved they are excluded in whole sets corresponding to the
original variables and not discarded willy-nilly.)
I am not pursuing an automated procedure. I do think that the possibility of collinearity associated with redundant, or more likely, partially redundant variables is potentially real, even in well thought out designs. Such possibility seems more likely in some exploratory analysis situations. In any case, I agree that an automated approach is not good, but I think a systematic approach should be used to asses and in some cases deal with issues thus revealed. If the approach is not systematic I am concerned that decisions about "plausible theoretical" removal made solely on the basis of the investigators opinion of plausibity may add bias, or at least inconsistency to the process of being informed by the data. It seems that even when one agrees with your ideas, the need to asses and in some cases eliminate collinearity may exist. In my case, I identified the problem, as you suggest, by the behavior of the errors and coefficients. In this case, at this point, obtaining more data is not a possibility. In the future, I expect that obtaining more data in this area may be informed by the current issues I have identified in the data I have; but it is nonetheless worth assessing the existing data.

In any case, while I concur with your ideas, I remain interested in possibilities for some systematic approaches to the issue of clollinear categorical variables

Thanks again for your response, it is much appreciated.

Buzz



As an aside, which means I'm not necessarily talking about Buzz Burhans'
situation, it's been my experience that far too many "problems" are blamed on
collinearity. A parameter estimate with a large variance is not by itself a
symptom of collinearity, for example. More often than not, it indicates an
irrelevant variable has been included in the analysis -- a theoretical problem
rather than a collinearity problem. In general, misspecification errors are far
more common than collinearity problems and should be ruled out before suspecting
collinearity.

Dave Moore

Buzz Burhans
wsb2@cornell.edu


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index