Factor variables

Order

<- See Stata's other features

Stata handles factor (categorical) variables elegantly. You can prefix a variable with i. to specify indicators for each level (category) of the variable. You can put a # between two variables to create an interaction–indicators for each combination of the categories of the variables. You can put ## instead to specify a full factorial of the variables—main effects for each variable and an interaction. If you want to interact a continuous variable with a factor variable, just prefix the continuous variable with c.. You can specify up to eight-way interactions.

We run a linear regression of cholesterol level on a full factorial of age group and whether the person smokes along with a continuous body mass index (bmi) and its interaction with whether the person smokes.

. regress cholesterol i.smoker##agegrp bmi i.smoker#c.bmi


      Source         SS           df       MS     Number of obs   =     4,049

       F(9, 4039)      =     15.30

       Model    137.845627         9  15.3161808     Prob > F        =    0.0000

    Residual    4044.55849     4,039   1.0013762     R-squared       =    0.0330

       Adj R-squared   =    0.0308

       Total    4182.40412     4,048   1.0332026     Root MSE        =    1.0007




  cholesterol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

   

       smoker  

      smoker     -.7699108    .337665    -2.28   0.023    -1.431921   -.1079012

   

       agegrp  

       45-49      .1554985   .0620537     2.51   0.012     .0338391    .2771579

       50-54      .1838839   .0618467     2.97   0.003     .0626303    .3051375

       55-59      .1746813   .0763244     2.29   0.022     .0250433    .3243193

   

smoker#agegrp  

smoker#45-49      -.118553   .1367914    -0.87   0.386    -.3867396    .1496336

smoker#50-54     -.1332379   .1363604    -0.98   0.329    -.4005796    .1341038

smoker#55-59     -.2466412   .1717679    -1.44   0.151    -.5834009    .0901185

   

          bmi     .0253916   .0059336     4.28   0.000     .0137585    .0370246

   

 smoker#c.bmi  

      smoker      .0501707   .0129223     3.88   0.000     .0248358    .0755055

   

        _cons     5.437234   .1520921    35.75   0.000     5.139049    5.735418

We could have used parenthesis binding, to type the same model more briefly:

. regress cholesterol smoker##(agegrp c.bmi)

Base levels can be changed on the fly: i.agegrp uses the default base level of 1, whereas b3.agegrp makes 3 the base level.

The level indicator variables are not created in your dataset, saving lots of space.

Factor variables are integrated deeply into Stata’s processing of variable lists, providing a consistent way of interacting with both estimation and postestimation commands.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Source	SS df MS	Number of obs = 4,049
		F(9, 4039) = 15.30
Model	137.845627 9 15.3161808	Prob > F = 0.0000
Residual	4044.55849 4,039 1.0013762	R-squared = 0.0330
		Adj R-squared = 0.0308
Total	4182.40412 4,048 1.0332026	Root MSE = 1.0007


cholesterol		Coefficient Std. err. t P>\|t\| [95% conf. interval]

smoker
smoker		-.7699108 .337665 -2.28 0.023 -1.431921 -.1079012

agegrp
45-49		.1554985 .0620537 2.51 0.012 .0338391 .2771579
50-54		.1838839 .0618467 2.97 0.003 .0626303 .3051375
55-59		.1746813 .0763244 2.29 0.022 .0250433 .3243193

smoker#agegrp
smoker#45-49		-.118553 .1367914 -0.87 0.386 -.3867396 .1496336
smoker#50-54		-.1332379 .1363604 -0.98 0.329 -.4005796 .1341038
smoker#55-59		-.2466412 .1717679 -1.44 0.151 -.5834009 .0901185

bmi		.0253916 .0059336 4.28 0.000 .0137585 .0370246

smoker#c.bmi
smoker		.0501707 .0129223 3.88 0.000 .0248358 .0755055

_cons		5.437234 .1520921 35.75 0.000 5.139049 5.735418

Factor variables

<- See Stata's other features

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies