Fellow Statalisters (especially StataCorp):

A query re -_rmcoll-. As I understand it, this is implemented at the executable level. (At any rate, I cannot find a -_rmcoll.ado- anywhere in my c:\Stata8\ados\ path of my Windows 2000 system.) It is therefore not immediately obvious exactly how it chooses which of a set of collinear variables to drop. However, I get the impression (by experimenting) that it builds a list of non-collinear variables iteratively by starting with an empty list and iterating along the -varlist- of candidates provided, from the first variable to the last in the given order, testing at each step whether the latest candidate is a linear combination of the existing non-collinear list, and adding the candidate to the list if it isn't. In "pidgin Stata", I get the impression that it works something like as follows:

local ncvarlist ""
foreach X of var `origvarlist' {
if islinearlydependent(`X' `ncvarlist') {
disp as text "Note: `X' dropped due to collinearity"
else {
local ncvarlist "`ncvarlist' `X'"

where -ncvarlist- is the list of non-collinear variables being assembled, -origvarlist- is the original variable list provided, and -islinearlydependent(varlist)- is a fantasy function returning 1 if the variables of -varlist- are linearly dependent and 0 otherwise. I get that impression because, if I include 2 variables with identical values in a list of X-variables, then it is the second one that is dropped, not the first.

Can I assume that, in general, this is how -_rmcoll- works? And, if not, then how does it work?

Best wishes (and thanks in advance)


