Title

[U] 11.4.3 Factor variables

Description

Factor variables are extensions of varlists of existing variables.  When
a command allows factor variables, in addition to typing variable names
from your data, you can type factor variables, which might look like

i.varname

i.varname#i.varname

i.varname#i.varname#i.varname

i.varname##i.varname

i.varname##i.varname##i.varname

Factor variables create indicator variables from categorical variables,
interactions of indicators of categorical variables, interactions of
categorical and continuous variables, and interactions of continuous
variables (polynomials).  They are allowed with most estimation and
postestimation commands, along with a few other commands.

There are five factor-variable operators:

Operator  Description
--------------------------------------------------------------------
i.        unary operator to specify indicators
c.        unary operator to treat as continuous
o.        unary operator to omit a variable or indicator
#         binary operator to specify interactions
##        binary operator to specify factorial interactions
--------------------------------------------------------------------

The indicators and interactions created by factor-variable operators are
referred to as virtual variables.  They act like variables in varlists
but do not exist in the dataset.

Categorical variables to which factor-variable operators are applied must
contain nonnegative integers with values in the range 0 to 32,740,
inclusive.

Factor variables may be combined with the L. and F. time-series
operators.

Remarks

Remarks are presented under the following headings:

Basic examples
Base levels
Selecting levels
Applying operators to a group of variables
Video examples

Basic examples

Here are some examples of use of the operators:

Factor
specification     Result
--------------------------------------------------------------------
i.group           indicators for levels of group

i.group#i.sex     indicators for each combination of levels of group
and sex, a two-way interaction

group#sex         same as i.group#i.sex

group#sex#arm     indicators for each combination of levels of
group, sex, and arm, a three-way interaction

group##sex        same as i.group i.sex group#sex

group##sex##arm   same as i.group i.sex i.arm group#sex group#arm
sex#arm group#sex#arm

sex#c.age         two variables -- age for males and 0 elsewhere,
and age for females and 0 elsewhere; if age is
also in the model, one of the two virtual
variables will be treated as a base

sex##c.age        same as i.sex age sex#c.age

c.age             same as age

c.age#c.age       age squared

c.age#c.age#c.age age cubed
--------------------------------------------------------------------

Base levels

You can specify the base level of a factor variable by using the ib.
operator.  The syntax is

Base
operator(*)    Description
------------------------------------------------------------------
ib#.           use # as base, #=value of variable
ib(##).        use the #th ordered value as base (**)
ib(first).     use smallest value as base (the default)
ib(last).      use largest value as base
ib(freq).      use most frequent value as base
ibn.           no base level
------------------------------------------------------------------
(*) The i may be omitted.  For instance, you may type ib2.group
or b2.group.
(**) For example, ib(#2). means to use the second value as the
base.

Thus, if you want to use group=3 as the base in a regression, you can
type

. regress y  i.sex ib3.group

You can also permanently set the base levels of categorical variables by
using the fvset command.

Selecting levels

You can select a range of levels -- a range of virtual variables -- by
using the i(numlist). operator.

Examples          Description
--------------------------------------------------------------------
i2.cat            a single indicator for cat=2

2.cat             same as i2.cat

i(2 3 4).cat      three indicators, cat=2, cat=3, and cat=4;
same as i2.cat i3.cat i4.cat

i(2/4).cat        same as i(2 3 4).cat

2.cat#1.sex       a single indicator that is 1 when cat=2 and sex=1,
and is 0 otherwise

i2.cat#i1.sex     same as 2.cat#1.sex
--------------------------------------------------------------------

Rather than selecting the levels that should be included, you can specify
the levels that should be omitted by using the o. operator.  When you use
io(numlist).varname in a command, indicators for the levels of varname
other than those specified in numlist are included.  When omitted levels
are specified with the o. operator, the i. operator is implied, and the
remaining indicators for the levels of varname will be included.

Examples          Description
--------------------------------------------------------------------
io2.cat           indicators for levels of cat, omitting the
indicator for cat=2

o2.cat            same as io2.cat

io(2 3 4).cat     indicators for levels of cat, omitting three
indicators, cat=2, cat=3, and cat=4

o(2 3 4).cat      same as io(2 3 4).cat

o(2/4).cat        same as io(2 3 4).cat

o2.cat#o1.sex     indicators for each combination of the levels of
cat and sex, omitting the indicator for cat=2
and sex=1
--------------------------------------------------------------------

Applying operators to a group of variables

Factor-variable operators may be applied to groups of variables by using
parentheses.

In the examples that follow, variables group, sex, arm, and cat are
categorical, and variables age, wt, and bp are continuous:

Examples                  Expansion
--------------------------------------------------------------------
i.(group sex arm)         i.group i.sex i.arm

group#(sex arm cat)       group#sex group#arm group#cat

group##(sex arm cat)      i.group i.sex i.arm i.cat group#sex
group#arm group#cat

group#(c.age c.wt c.bp)   i.group group#c.age group#c.wt group#c.bp

group#c.(age wt bp)       same as group#(c.age c.wt c.bp)
--------------------------------------------------------------------

Video examples

Introduction to factor variables in Stata, part 1: The basics

Introduction to factor variables in Stata, part 2: Interactions

Introduction to factor variables in Stata, part 3: More interactions

