Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: collinearity &c. [was: multinollinearity and non linar estimation]


From   "Austin Nichols" <[email protected]>
To   [email protected]
Subject   st: collinearity &c. [was: multinollinearity and non linar estimation]
Date   Thu, 21 Jun 2007 12:46:53 -0400

Nick et al.--
It may be bootless guessing what Rodolfo actually means, given the
imprecision evident in the subject line and the post, but my guess is
that he means not "perfect collinearity" in the sense of linear
dependence, nor "collinearity" in the sense of "correlations between
some explanatory variables or linear combinations thereof are high,"
but instead some generalization to a nonlinear context, presumably
functional dependence [not the database concept]. I.e. if
y = x^a * w^b * e
and also x = c * w^d + u then a and b are not identified, even though
x and w are not generically collinear (when d!=1).  Presumably, the
appropriate test would depend on the functional form.

In the example given, one might
g lny=ln(y)
reg lny x w
estat vif
There are several user-written contributions easily available via
-findit- e.g. -perturb- -collin- -coldiag- and -coldiag2-

Note in particular that the help for -perturb- says "perturb is not
limited to linear regression but can be used for all regression-like
models."

and see also:

Donald E. Farrar and Robert R. Glauber. 1967.
"Multicollinearity: The Problem Revisited,"
The Review of Economics and Statistics, 49(1): 92-107.
http://links.jstor.org/sici?sici=0034-6535%28196702%2949%3A1%3C92%3AMIRATP%3E2.0.CO%3B2-6

D. Belsley, E. Kuh, and R. Welsch. 1980.
Regression Diagnostics: Identifying Influential Data
and Sources of Collinearity.  New York: Wiley.

D. Belsley. 1991.
Conditioning diagnostics, collinearity and weak data in regression.
New York: Wiley.

and on the last reference:
<<
Chapter 7 presents some of the most important material in the book for
statisticians. It is based on research done since BKW into determining
when collinearity causes a statistical problem. Even when the
condition indices indicate that collinearity is present, the
collinearity may not be harmfully degrading the parameter estimates.
Here Belsley applies a signal-to-noise ratio approach to quantifying
harmful collinearity. If collinearity is present (as determined by the
condition indices and variance proportions) and the signal-to-noise
ratio for estimating a parameter of interest is not adequate, then
collinearity is harmful. In some cases, however, the signal-to-noise
ratio is inadequate but collinearity is not present. If so, short data
may be the culprit. ("Short data" is a fitting name, as this situation
occurs when the length of an associated column of the design matrix is
too small.)

There has been a great deal of research over the past 10 years on
collinearity-influential observations. Chapter 8 reviews this work,
pointing out that an important issue is whether observations that mask
or create collinearity are good or bad; that is, as with influential
observations in the usual sense, whether they are misleading or
contain crucial information. After much discussion (perhaps giving the
concept of collinearity-influential cases every possible chance),
Belsley concludes that "the collinearity-influential diagnostics seem
to add nothing to what is learned by the influential- data
diagnostics" (p. 270).

Chapter 9 investigates the applicability of Belsley's collinearity
diagnostics to two special instances of the linear regression model.
In the first case the logarithm of one or more interpretable variates
appears in the model. Here Belsley shows that applying the
collinearity diagnostics to the variates that appear in the model (the
logged variates) can be very misleading, as the results are measures
of the sensitivity of the parameter estimates to changes in the logged
variates and not in the interpretable variates. He suggests instead
that the original interpretable variates be scaled so that a small
change in these scaled variates is like a small change in the log of
these scaled variates, and that the collinearity diagnostics be
computed from the log of these scaled variates. The guiding principle
here is that the diagnostics provide an indication as to how sensitive
the least squares solution is to small changes in the design, and so
are meaningless if these changes cannot be interpreted. Belsley
applies the same principle to models containing differenced data,
again emphasizing that the diagnostics must be applied to
interpretable data to be meaningful.

Chapter 10 continues the discussions of Chapter 7 by suggesting
corrective action when collinearity has been diagnosed as present and
harmful. Belsley offers two possibilities: collect more data or
incorporate prior information about the parameters. The idea of
collecting new data is not always practical; it may be too expensive
or conditions may have changed. Alternatively, Belsley suggests some
Bayes-like techniques for incorporating prior information in the hope
that sufficient precision may be gained in the parameter estimates. He
mentions a Bayesian analysis only briefly, pointing out the
difficulties in specifying a prior distribution and in performing many
of the necessary computations, but closes by saying that "one can hope
to see these [Bayesian] techniques gain wider acceptance in the
future" (p. 298). The Bayes-like procedures that he discusses inject
prior information via linear constraints on the parameters. A point
that Belsley is most insistent about is that deleting data that cannot
be shown to be in error is not a solution to a collinearity problem.

from:

Elizabeth H. Slate. 1993.
"Review of _Conditioning Diagnostics: Collinearity and Weak Data
 in Regression_ by David A. Belsley "
Journal of the American Statistical Association, 88(421): 384-385.
http://links.jstor.org/sici?sici=0162-1459%28199303%2988%3A421%3C384%3ACDCAWD%3E2.0.CO%3B2-N

------------------------------------------------------
On 6/20/07, Nick Cox <[email protected]> wrote:
Multicollinearity is a property of the covariates.

I don't see that it depends on the functional
relationship between the covariates and the response.

Nick
[email protected]
------------------------------------------------------
> Rodolfo Coelho Prates wrote:
> I estimated a nonlinear funcion with nl command and I would
> like to know
> if there is a test to detect multicollinearity in Stata.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index