Search
   >> Home >> Products >> Features >> Factor variables

Factor variables


Stata handles factor (categorical) variables elegantly. You can prefix a variable with i. to specify indicators for each level (category) of the variable. You can put a # between two variables to create an interaction–indicators for each combination of the categories of the variables. You can put ## instead to specify a full factorial of the variables—main effects for each variable and an interaction. If you want to interact a continuous variable with a factor variable, just prefix the continuous variable with c.. You can specify up to eight-way interactions.

We run a linear regression of cholesterol level on a full factorial of age group and whether the person smokes along with a continuous body mass index (bmi) and its interaction with whether the person smokes.

. regress cholesterol smoker##agegrp bmi smoker#c.bmi
Source SS df MS Number of obs = 4049
F( 9, 4039) = 11.01
Model 128.054581 9 14.2282868 Prob > F = 0.0000
Residual 5217.55346 4039 1.29179338 R-squared = 0.0240
Adj R-squared = 0.0218
Total 5345.60804 4048 1.32055535 Root MSE = 1.1366
cholesterol Coef. Std. Err. t P>|t| [95% Conf. Interval]
smoker
smoker -.7440153 .2860484 -2.60 0.009 -1.304828 -.1832026
 
agegrp
45-49 .1109246 .0742678 1.49 0.135 -.0346813 .2565305
50-54 .1517449 .0709628 2.14 0.033 .0126186 .2908712
55-59 .1930751 .0739847 2.61 0.009 .0480244 .3381258
 
smoker#agegrp
smoker#45-49 -.131567 .1001652 -1.31 0.189 -.3279462 .0648121
smoker#50-54 -.117271 .0985225 -1.19 0.234 -.3104295 .0758875
smoker#55-59 -.2302833 .1032766 -2.23 0.026 -.4327625 -.0278042
 
bmi .0278583 .0080532 3.46 0.001 .0120695 .0436471
 
smoker#c.bmi
smoker .0343858 .0107435 3.20 0.001 .0133226 .055449
 
_cons 5.511677 .2155833 25.57 0.000 5.089015 5.934339

We could have used parenthesis binding, to type the same model more briefly:

. regress cholesterol smoker##(agegrp c.bmi)

Base levels can be changed on the fly: i.agegrp uses the default base level of 1, whereas b3.agegrp makes 3 the base level.

The level indicator variables are not created in your dataset, saving lots of space.

Factor variables are integrated deeply into Stata’s processing of variable lists, providing a consistent way of interacting with both estimation and postestimation commands.

The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube