Home  /  Products  /  Stata 13  /  Factor variables and value labels

Factor variables and value labels were introduced in Stata 13.

See the latest version of factor variables and value labels. See all of Stata's factor variables features.

See the new features in Stata 18.

Factor variables and value labels

ORDER STATA

Highlights

  • Value labels on factor variables now appear in estimation output
  • Value labels appear on contrasts, margins, and pairwise comparisons



Show me

A factor variable might be

  • attitude measured on a scale of 1 to 5,
  • agegrp recorded 1 to 4, 1 being 20-30, 2 being 31-40, ...
  • region being 1 (North East), 2 (North Central), ...

When you fit a model, Stata allows factor-variable notation. You can type

i.attitude

to obtain the levels of factor variable attitude.

i.attitude#c.age

to obtain the levels of attitude interacted with continuous variable age

i.attitude##c.age

to mean i.attitude age i.attitude#c.age

i.attitude#i.agegrp

to obtain the levels of attitude interacted with the levels of agegrp

i.attitude##i.agegrp

to mean i.attitude i.agegrp i.attitude#i.agegrp

i.attitude#i.agegrp#i.region

to obtain the levels of attitude interacted with the levels of agegrp interacted with the levels of region

i.attitude##i.agegrp##i.region

to mean
	i.attitude  i.agegrp  i.region 
	i.attitude#i.agegrp   i.attitude#i.region 
	i.agegrp#i.region     
	i.attitude#i.agegrp#i.region 
i.(attitude agegrp)

to mean i.attitude i.agegrp

i.(attitude agegrp)##i.region

to mean i.attitude##i.region i.agegrp##i.region

and so on.

Stata also has value labels. You might type

. label define regions 1 "North East"  2 "North Central"
                       3 "South"       4 "West"

. label values region regions

In Stata 13, when you fit a model using factor-variable notation, the labels appear in the output:

. regress  y  i.attitude i.agegrp i.region

Source SS df MS Number of obs = 400
F( 10, 389) = 22.60
Model 2668.04079 10 266.804079 Prob > F = 0.0000
Residual 4592.44366 389 11.8057678 R-squared = 0.3675
Adj R-squared = 0.3512
Total 7260.48445 399 18.1967029 Root MSE = 3.436
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
attitude
disagree 1.27901 .5617435 2.28 0.023 .1745764 2.383443
neutral 1.466543 .5304032 2.76 0.006 .4237268 2.509358
agree 2.063136 .5326997 3.87 0.000 1.015805 3.110467
strongly agree 3.550927 .5801312 6.12 0.000 2.410343 4.691512
agegrp
31-40 2.114168 .4868806 4.34 0.000 1.156921 3.071414
41-50 3.970627 .4866537 8.16 0.000 3.013826 4.927428
50+ 5.990408 .4869362 12.30 0.000 5.033052 6.947764
region
North Central .673176 .4913976 1.37 0.172 -.2929515 1.639304
South -1.366099 .491862 -2.78 0.006 -2.33314 -.3990588
West -1.477714 .4890703 -3.02 0.003 -2.439266 -.5161623
_cons 8.411983 .5760115 14.60 0.000 7.279498 9.544468

Value labels are also used by Stata's postestimation commands. Below we use pwcompare to compare y values for each pairing of the age groups:

. pwcompare agegrp

Pairwise comparisons of marginal linear predictions

Margins      : asbalanced

Unadjusted
Contrast Std. Err. [95% Conf. Interval]
agegrp
31-40 vs 20-30 2.114168 .4868806 1.156921 3.071414
41-50 vs 20-30 3.970627 .4866537 3.013826 4.927428
50+ vs 20-30 5.990408 .4869362 5.033052 6.947764
41-50 vs 31-40 1.856459 .4869484 .8990793 2.813839
50+ vs 31-40 3.87624 .4870898 2.918582 4.833898
50+ vs 41-50 2.019781 .4878207 1.060686 2.978876

For instance, 31–40 year olds, have an average value of y that is 2.11 higher than that of 20–30 year olds, controlling for the other covariates in the model.

Show me more

To learn more about factor variables, see the manual entry.

To learn more about pwcompare, see its manual entry.

Back to highlights

See New in Stata 18 to learn about what was added in Stata 18.