Stata 15 help for _rmcoll

[P] _rmcoll -- Remove collinear variables


Identify variables to be omitted because of collinearity

_rmcoll varlist [if] [in] [weight] [, noconstant collinear expand forcedrop]

Identify independent variables to be omitted because of collinearity

_rmdcoll depvar indepvars [if] [in] [weight] [, noconstant collinear expand normcoll]

varlist and indepvars may contain factor variables; see fvvarlist. varlist, depvar, and indepvars may contain time-series operators; see tsvarlist. fweights, aweights, iweights, and pweights are allowed; see weight.


_rmcoll returns in r(varlist) an updated version of varlist that is specific to the sample identified by if, in, and any missing values in varlist. _rmcoll flags variables that are to be omitted because of collinearity. If varlist contains factor variables, then _rmcoll also enumerates the levels of factor variables, identifies the base levels of factor variables, and identifies empty cells in interactions.

The following message is displayed for each variable that _rmcoll flags as omitted because of collinearity:

note: ______ omitted because of collinearity

The following message is displayed for each empty cell of an interaction that _rmcoll encounters:

note: ______ identifies no observations in the sample

ml users: it is not necessary to call _rmcoll because ml flags collinear variables for you, assuming that you do not specify ml model's collinear option. Even so, ml programmers sometimes use _rmcoll because they need the sample-specific set of variables, and in such cases, they specify ml model's collinear option so that ml does not waste time looking for collinearity again.

_rmdcoll performs the same task as _rmcoll and checks that depvar is not collinear with the variables in indepvars. If depvar is collinear with any of the variables in indepvars, then _rmdcoll reports the following message with the 459 error code:

______ collinear with ______


noconstant specifies that, in looking for collinearity, an intercept not be included. That is, a variable that contains the same nonzero value in every observation should not be considered collinear.

collinear specifies that collinear variables not be flagged.

expand specifies that the expanded, level-specific variables be posted to r(varlist). This option will have an effect only if there are factor variables in the variable list.

forcedrop specifies that collinear variables be dropped from the variable list instead of being flagged. This option is not allowed when the variable list already contains flagged variables, factor variables, or interactions.

normcoll specifies that collinear variables have already been flagged in indepvars. Otherwise, _rmcoll is called first to flag any such collinearity.


_rmcoll and _rmdcoll are typically used when writing estimation commands.

_rmcoll is used if the programmer wants to flag the collinear variables from the independent variables.

_rmdcoll is used if the programmer wants to detect collinearity of the dependent variable with the independent variables.


--------------------------------------------------------------------------- Setup . webuse auto . generate tt = turn + trunk

Use _rmcoll to identify that we have a collinearity and flag a variable because of it . _rmcoll turn trunk tt . display r(varlist)

Pass a factor variable to _rmcoll . _rmcoll i.rep78 . display r(varlist)

Add the expand option to loop over the level-specific, individual variables in r(varlist) . _rmcoll i.rep78, expand . display r(varlist)


A code fragment for a program that uses _rmcoll might read

... syntax varlist [fweight iweight] ... [, noCONStant ... ] marksample touse if "`weight'" != "" { tempvar w quietly gen double `w' = `exp' if `touse' local wgt [`weight'=`w'] } else local wgt /* is nothing */ gettoken depvar xvars : varlist _rmcoll `xvars' `wgt' if `touse', `constant' local xvars `r(varlist)' ...

In this code fragment, varlist contains one dependent variable and zero or more independent variables. The dependent variable is split off and stored in the local macro depvar. Then the remaining variables are passed through _rmcoll, and the resulting updated independent variable list is stored in the local macro xvars.


Stored results

_rmcoll and _rmdcoll store the following in r():

Scalars r(k_omitted) number of omitted variables in r(varlist)

Macros r(varlist) the flagged and expanded variable list

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index