Home  /  Resources & Support  /  Introduction to Stata basics  /  How to label the values of categorical variables

Labeling the categories of variables in a dataset is one of the most basic and fundamental data management tasks. Sometimes, the category numbers make sense, but it is better to label each category clearly. These labels will appear in tables and graphs, so try to make them both short and accurate. Let's begin by opening an example dataset from the Stata website.

. use https://www.stata.com/users/youtube/rawdata.dta, clear
(Fictitious data based on the National Health and Nutrition Examination Survey)

Next, let's tabulate the variable sex.

. tabulate sex

Sex Freq. Percent Cum.
0 676 53.31 53.31
1 592 46.69 100.00
Total 1,268 100.00

The variable sex has two categories numbered 0 and 1. But it isn't clear what the numbers represent. The 0 could represent males and the 1 could represent females. Or 0 could represent females and the 1 could represent males.

Let's assume that we checked and 0 represents males and 1 represents females. We can define a value label named sexlabel that makes this clear.

. label define sexlabel 0 "Male" 1 "Female"

We can type label list to view the definition of sexlabel

. label list sexlabel
sexlabel:
           0 Male
           1 Female

Once we define the label, we must attach it to the variable.

. label values sex sexlabel

Then we can tabulate sex to see the labels in a table.

. tabulate sex

Sex Freq. Percent Cum.
Male 676 53.31 53.31
Female 592 46.69 100.00
Total 1,268 100.00

Now we can save our dataset.

. save mydata
file mydata.dta saved

You can watch a demonstration of these commands by clicking on the link to the YouTube video below. You can read more about these commands by clicking on the links to the Stata manual entries below.