help _rmcoll, help _rmdcoll
-------------------------------------------------------------------------------
Title
[P] _rmcoll -- Remove collinear variables
Syntax
Identify variables to be omitted because of collinearity
_rmcoll varlist [if] [in] [weight] [, noconstant collinear expand
forcedrop]
Identify independent variables to be omitted because of collinearity
_rmdcoll depvar indepvars [if] [in] [weight] [, noconstant collinear
expand forcedrop normcoll]
fweights, aweights, iweights, and pweights are allowed; see weight.
Description
_rmcoll returns in r(varlist) an updated version of varlist that is
specific to the sample identified by if, in, and any missing values in
varlist. _rmcoll flags variables that are to be omitted because of
collinearity. If varlist contains factor variables, then _rmcoll also
enumerates the levels of factor variables, identifies the base levels of
factor variables, and identifies empty cells in interactions.
The following message is displayed for each variable that _rmcoll flags
as omitted because of collinearity:
note: ______ omitted because of collinearity
The following message is displayed for each empty cell of an interaction
that _rmcoll encounters:
note: ______ identifies no observations in the sample
ml users: it is not necessary to call _rmcoll because ml flags collinear
variables for you, assuming that you do not specify ml model's collinear
option. Even so, ml programmers sometimes use _rmcoll because they need
the sample-specific set of variables, and in such cases, they specify ml
model's collinear option so that ml does not waste time looking for
collinearity again.
_rmdcoll performs the same task as _rmcoll and checks that depvar is not
collinear with the variables in indepvars. If depvar is collinear with
any of the variables in indepvars, then _rmdcoll reports the following
message with the 459 error code:
______ collinear with ______
Options
noconstant specifies that, in looking for collinearity, an intercept not
be included. That is, a variable that contains the same nonzero
value in every observation should not be considered collinear.
collinear specifies that collinear variables not be flagged.
expand specifies that the expanded, level-specific variables be posted to
r(varlist). This option will have an effect only if there are factor
variables in the variable list.
forcedrop specifies that collinear variables be dropped from the variable
list instead of being flagged. This option is not allowed when the
variable list already contains flagged variables, factor variables,
or interactions.
normcoll specifies that collinear variables have already been flagged in
indepvars. Otherwise, _rmcoll is called first to flag any such
collinearity.
Remarks
_rmcoll and _rmdcoll are typically used when writing estimation commands.
_rmcoll is used if the programmer wants to flag the collinear variables
from the independent variables.
_rmdcoll is used if the programmer wants to detect collinearity of the
dependent variable with the independent variables.
Examples
---------------------------------------------------------------------------
Setup
. webuse auto
. generate tt = turn + trunk
Use _rmcoll to identify that we have a collinearity and flag a variable
because of it
. _rmcoll turn trunk tt
. display r(varlist)
Pass a factor variable to _rmcoll
. _rmcoll i.rep78
. display r(varlist)
Add the expand option to loop over the level-specific, individual
variables in r(varlist)
. _rmcoll i.rep78, expand
. display r(varlist)
---------------------------------------------------------------------------
A code fragment for a program that uses _rmcoll might read
...
syntax varlist [fweight iweight] ... [, noCONStant ... ]
marksample touse
if "`weight'" != "" {
tempvar w
quietly gen double `w' = `exp' if `touse'
local wgt [`weight'=`w']
}
else local wgt /* is nothing */
gettoken depvar xvars : varlist
_rmcoll `xvars' `wgt' if `touse', `constant'
local xvars `r(varlist)'
...
In this code fragment, varlist contains one dependent variable and zero
or more independent variables. The dependent variable is split off and
stored in the local macro depvar. Then the remaining variables are
passed through _rmcoll, and the resulting updated independent variable
list is stored in the local macro xvars.
---------------------------------------------------------------------------
Saved results
_rmcoll and _rmdcoll save the following in r():
Scalars
r(k_omitted) number of omitted variables in r(varlist)
Macros
r(varlist) the flagged and expanded variable list
Also see
Manual: [P] _rmcoll
Help: [P] _rmcollright, [R] ml