Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Re: collinear categorical variable identification

From   "Scott Merryman" <>
To   <>
Subject   st: Re: Re: collinear categorical variable identification
Date   Fri, 13 Jun 2003 13:46:39 -0500

----- Original Message ----- 
From: "Buzz Burhans" <>
To: <>
Sent: Thursday, June 12, 2003 3:22 PM
Subject: st: Re: collinear categorical variable identification

> Thank you David, for your comments. For the most part I concur with the
> ideas you express.  One clarification, regarding the following comment:
>    Just as we shouldn't be tempted to use
> >stepwise methods to formulate regression models, I don't think we should
> >rely on
> >automated processes for diagnosing and solving problems of collinearity.
> >Burhans has indicated that "theoretical plausibility" is one of the
> >criteria he
> >used.  Aside from the estimated coefficients and standard errors (or
> >which
> >alert us to the existence of the problem, I submit this is the only
> >that should be used.  (I assume that whatever procedure is followed, when
> >dummy
> >variables are involved they are excluded in whole sets corresponding to
> >original variables and not discarded willy-nilly.)
> I am not pursuing an automated procedure.  I do think that the possibility
> of collinearity associated with redundant, or more likely, partially
> redundant variables is potentially real, even in well thought out
> designs.  Such possibility seems more likely in some exploratory analysis
> situations.  In any case, I agree that an automated approach is not good,
> but I think a systematic approach should be used to asses and in some
> deal with issues thus revealed.  If the approach is not systematic I am
> concerned that decisions about "plausible theoretical" removal made solely
> on the basis of the investigators opinion of plausibity may add bias, or
> least inconsistency to the process of being informed by the data.  It
> that even when one agrees with your ideas, the need to asses and in some
> cases eliminate collinearity may exist. In my case, I identified the
> problem, as you suggest, by the behavior of the errors and
> coefficients.  In this case, at this point, obtaining more data is not a
> possibility.  In the future, I expect that obtaining more data in this
> may be informed by the current issues I have identified in the data I
> but it is nonetheless worth assessing the existing data.
> In any case, while I concur with your ideas, I remain interested in
> possibilities for some systematic approaches to the issue of clollinear
> categorical variables
> Thanks again for your response, it is much appreciated.
> Buzz

While not specifically addressing collinear categorical variables, Peter
Kennedy 's "Guide to Econometrics" presents two basic options to deal with

1. Do nothing.

2. Incorporate Additional Information.
a. Obtain more data

b. Formalize relationships among regressors and estimate in a simultaneous

c. Specify a relationship among some parameters.  Theory may suggest that
two coefficients should be equal or sum to one, for example.

d. Drop a variable.  However, omitting a relevant variables biases the
remaining coefficients unless they are uncorrelated with the omitted
variable.  As noted by Dreze (1983) "setting a coefficient equal to zero
because it is estimated with poor precision amounts to elevating ignorance
to arrogance."

e. Incorporate estimates form other studies

f. Form a principal component.

g. Shrink the OLS estimates - a ridge or Stein estimator.

Hope this helps,

Dreze, J (1983). "Nonspecialist Teaching of Econometrics: A Personal Comment
and Personalistic Lament"  Econometric Reviews 2, 291-9.

*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index