Roger Newson <roger.newson@kcl.ac.uk> asked
> Can I assume that, in general, this is how -_rmcoll- works? And, if not,
> then how does it work?
after speculating.
Let varlist contain a set of variable names, among which we want to find
the noncolinear subset. Stata forms syminv((X-mean(X)'(X-mean(X)) (default,
syminv(X'X) if option -noconstant- is specified) and then searches along the
diagonal for zero values, which correspond to the collinear variables. Thus,
-_rmcoll- lets syminv() do the heavy lifting, which was not only convenient to
program but also guaranteed that -_rmcoll- would produce results consistent
with those of -regress- and all of Stata's other estimation commands.
syminv() is Stata's symmetric-matrix inversion routine. I have written before
about how syminv() works. In a nutshell, syminv() decides which of a
collinear set of variables to drop based on numerical accuracy considerations;
it does *NOT* proceed from left to right, as Roger speculated.
Roger observed that if the same variable appeared in varlist more than
once, the rightmost one was dropped. In general, that will happen in the
equal-variable case because syminv() searches from left to right every step of
the way, but I cannot guarantee the rightmost result because, in earlier
steps, operations are performed on the matrix and the order in which the
operations were performed depends on the location of the row/column.
-- Bill
wgould@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/