Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: _rmcoll query


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: _rmcoll query
Date   Wed, 16 Feb 2005 08:20:58 -0600

Roger Newson <roger.newson@kcl.ac.uk> asked

> Can I assume that, in general, this is how -_rmcoll- works? And, if not,
> then how does it work?

after speculating.

Let varlist contain a set of variable names, among which we want to find 
the noncolinear subset.  Stata forms syminv((X-mean(X)'(X-mean(X)) (default,
syminv(X'X) if option -noconstant- is specified) and then searches along the
diagonal for zero values, which correspond to the collinear variables.  Thus,
-_rmcoll- lets syminv() do the heavy lifting, which was not only convenient to
program but also guaranteed that -_rmcoll- would produce results consistent
with those of -regress- and all of Stata's other estimation commands.

syminv() is Stata's symmetric-matrix inversion routine.  I have written before
about how syminv() works.  In a nutshell, syminv() decides which of a
collinear set of variables to drop based on numerical accuracy considerations;
it does *NOT* proceed from left to right, as Roger speculated.

Roger observed that if the same variable appeared in varlist more than 
once, the rightmost one was dropped.  In general, that will happen in the
equal-variable case because syminv() searches from left to right every step of
the way, but I cannot guarantee the rightmost result because, in earlier
steps, operations are performed on the matrix and the order in which the
operations were performed depends on the location of the row/column.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index