Home  /  Products  /  Features  /  Factor variables and value labels

<-  See Stata's other features

Highlights

  • Value labels on factor variables now appear in estimation output

  • Value labels appear on contrasts, margins, and pairwise comparisons

A factor variable might be

  • attitude measured on a scale of 1 to 5,

  • agegrp recorded 1 to 4, 1 being 20-30, 2 being 31-40, ...

  • region being 1 (North East), 2 (North Central), ...

When you fit a model, Stata allows factor-variable notation. You can type

i.attitude

to obtain the levels of factor variable attitude.

i.attitude#c.age

to obtain the levels of attitude interacted with continuous variable age

i.attitude##c.age

to mean i.attitude age i.attitude#c.age

i.attitude#i.agegrp

to obtain the levels of attitude interacted with the levels of agegrp

i.attitude##i.agegrp

to mean i.attitude i.agegrp i.attitude#i.agegrp

i.attitude#i.agegrp#i.region

to obtain the levels of attitude interacted with the levels of agegrp interacted with the levels of region

i.attitude##i.agegrp##i.region

to mean i.attitude i.agegrp i.region i.attitude#i.agegrp i.attitude#i.region i.agegrp#i.region i.attitude#i.agegrp#i.region

i.(attitude agegrp)

to mean i.attitude i.agegrp

i.(attitude agegrp)##i.region

to mean i.attitude##i.region i.agegrp##i.region

and so on.

Stata also has value labels. You might type

. label define region 1 "North east" 2 "North central" 3 "South" 4 "West"

. label values region region

In Stata, when you fit a model using factor-variable notation, the labels appear in the output:

. regress  y  i.attitude i.agegrp i.region

Source SS df MS Number of obs = 400
F( 10, 389) = 20.66
Model 2827.06814 10 282.706814 Prob > F = 0.0000
Residual 5323.1334 389 13.6841476 R-squared = 0.3469
Adj R-squared = 0.3301
Total 8150.20154 399 20.4265703 Root MSE = 3.6692
y Coefficient Std. err. t P>|t| [95% conf. interval]
attitude
Disagree 1.249792 .5911202 2.11 0.035 .0876016 2.411982
Neutral 1.556975 .6055619 2.57 0.011 .3663909 2.747558
Agree 1.929689 .6013783 3.21 0.001 .7473308 3.112048
Strongly agree 3.449392 .5878003 5.87 0.000 2.293729 4.605055
agegrp
31-40 1.876376 .5265159 3.56 0.000 .8412029 2.911549
41-50 3.583126 .5174298 6.92 0.000 2.565817 4.600435
50+ 5.8832 .5266951 11.17 0.000 4.847675 6.918726
region
North Central .447837 .5133434 0.87 0.384 -.5614377 1.457112
South -1.397359 .5263279 -2.65 0.008 -2.432163 -.3625559
West -2.065611 .5209034 -3.97 0.000 -3.089749 -1.041473
_cons 9.115172 .6209843 14.68 0.000 7.894266 10.33608

Value labels are also used by Stata's postestimation commands. Below we use pwcompare to compare y values for each pairing of the age groups:

. pwcompare agegrp

Pairwise comparisons of marginal linear predictions

Margins: asbalanced

Unadjusted
Contrast Std. Err. [95% Conf. Interval]
agegrp
31-40 vs 20-30 1.876376 .5265159 .8412029 2.911549
41-50 vs 20-30 3.583126 .5174298 2.565817 4.600435
50+ vs 20-30 5.8832 .5266951 4.847675 6.918726
41-50 vs 31-40 1.70675 .5264024 .6718 2.741699
50+ vs 31-40 4.006824 .5373647 2.950322 5.063327
50+ vs 41-50 2.300075 .5304714 1.257125 3.343024

For instance, 31–40 year olds, have an average value of y that is 2.11 higher than that of 20–30 year olds, controlling for the other covariates in the model.

Tell me more

To learn more about factor variables, see the manual entry.

To learn more about pwcompare, see its manual entry.