## Stata 15 help for fvvarlist

Title

[U] 11.4.3 Factor variables

Description

Factor variables are extensions of varlists of existing variables. When a command allows factor variables, in addition to typing variable names from your data, you can type factor variables, which might look like

i.varname

i.varname#i.varname

i.varname#i.varname#i.varname

i.varname##i.varname

i.varname##i.varname##i.varname

Factor variables create indicator variables from categorical variables, interactions of indicators of categorical variables, interactions of categorical and continuous variables, and interactions of continuous variables (polynomials). They are allowed with most estimation and postestimation commands, along with a few other commands.

There are five factor-variable operators:

Operator Description -------------------------------------------------------------------- i. unary operator to specify indicators c. unary operator to treat as continuous o. unary operator to omit a variable or indicator # binary operator to specify interactions ## binary operator to specify factorial interactions --------------------------------------------------------------------

The indicators and interactions created by factor-variable operators are referred to as virtual variables. They act like variables in varlists but do not exist in the dataset.

Categorical variables to which factor-variable operators are applied must contain nonnegative integers with values in the range 0 to 32,740, inclusive.

Factor variables may be combined with the L. and F. time-series operators.

Remarks

Remarks are presented under the following headings:

Basic examples

Here are some examples of use of the operators:

Factor specification Result -------------------------------------------------------------------- i.group indicators for levels of group

i.group#i.sex indicators for each combination of levels of group and sex, a two-way interaction

group#sex same as i.group#i.sex

group#sex#arm indicators for each combination of levels of group, sex, and arm, a three-way interaction

group##sex same as i.group i.sex group#sex

group##sex##arm same as i.group i.sex i.arm group#sex group#arm sex#arm group#sex#arm

sex#c.age two variables -- age for males and 0 elsewhere, and age for females and 0 elsewhere; if age is also in the model, one of the two virtual variables will be treated as a base

sex##c.age same as i.sex age sex#c.age

c.age same as age

c.age#c.age age squared

c.age#c.age#c.age age cubed --------------------------------------------------------------------

Base levels

You can specify the base level of a factor variable by using the ib. operator. The syntax is

Base operator(*) Description ------------------------------------------------------------------ ib#. use # as base, #=value of variable ib(##). use the #th ordered value as base (**) ib(first). use smallest value as base (the default) ib(last). use largest value as base ib(freq). use most frequent value as base ibn. no base level ------------------------------------------------------------------ (*) The i may be omitted. For instance, you may type ib2.group or b2.group. (**) For example, ib(#2). means to use the second value as the base.

Thus, if you want to use group=3 as the base in a regression, you can type

. regress y i.sex ib3.group

You can also permanently set the base levels of categorical variables by using the fvset command.

Selecting levels

You can select a range of levels -- a range of virtual variables -- by using the i(numlist). operator.

Examples Description -------------------------------------------------------------------- i2.cat a single indicator for cat=2

2.cat same as i2.cat

i(2 3 4).cat three indicators, cat=2, cat=3, and cat=4; same as i2.cat i3.cat i4.cat

i(2/4).cat same as i(2 3 4).cat

2.cat#1.sex a single indicator that is 1 when cat=2 and sex=1, and is 0 otherwise

i2.cat#i1.sex same as 2.cat#1.sex --------------------------------------------------------------------

Rather than selecting the levels that should be included, you can specify the levels that should be omitted by using the o. operator. When you use io(numlist).varname in a command, indicators for the levels of varname other than those specified in numlist are included. When omitted levels are specified with the o. operator, the i. operator is implied, and the remaining indicators for the levels of varname will be included.

Examples Description -------------------------------------------------------------------- io2.cat indicators for levels of cat, omitting the indicator for cat=2

o2.cat same as io2.cat

io(2 3 4).cat indicators for levels of cat, omitting three indicators, cat=2, cat=3, and cat=4

o(2 3 4).cat same as io(2 3 4).cat

o(2/4).cat same as io(2 3 4).cat

o2.cat#o1.sex indicators for each combination of the levels of cat and sex, omitting the indicator for cat=2 and sex=1 --------------------------------------------------------------------

Applying operators to a group of variables

Factor-variable operators may be applied to groups of variables by using parentheses.

In the examples that follow, variables group, sex, arm, and cat are categorical, and variables age, wt, and bp are continuous:

Examples Expansion -------------------------------------------------------------------- i.(group sex arm) i.group i.sex i.arm

group#(sex arm cat) group#sex group#arm group#cat

group##(sex arm cat) i.group i.sex i.arm i.cat group#sex group#arm group#cat

group#(c.age c.wt c.bp) i.group group#c.age group#c.wt group#c.bp

group#c.(age wt bp) same as group#(c.age c.wt c.bp) --------------------------------------------------------------------

Video examples

Introduction to factor variables in Stata, part 1: The basics

Introduction to factor variables in Stata, part 2: Interactions

Introduction to factor variables in Stata, part 3: More interactions