Home  /  Resources & Support  /  Introduction to Stata basics  /  How to convert string data to numeric data

Sometimes, data that look like numbers are actually stored as strings. We will need to convert these variables to numeric data before we can use them with Stata's statistical features.

Let's begin by opening an example dataset from the Stata website and listing the first five observations for the variable chol.

. use https://www.stata.com/users/youtube/rawdata.dta, clear
(Fictitious data based on the National Health and Nutrition Examination Survey)

. list chol in 1/5

chol
1. 280
2. 280
3. 219
4. 198
5. 231

The data for chol appear to be numbers. Let's type summarize chol to estimate some descriptive statistics.

. summarize chol

Variable Obs Mean Std. dev. Min Max
chol 0

The output shows 0 observations, and the mean, standard deviation, minimum, and maximum are empty. This is our first clue that chol may be stored as a string variable. We can verify this by describing the data.

. describe chol

Variable      Storage   Display    Value
    name         type    format    label      Variable label
chol str3 %9s serum cholesterol (mg/dL)

The Storage type for the variable chol is "str3". This means that chol is stored as a string variable that holds three characters. We can create a numeric variable named choln from chol using destring.

. destring chol, gen(choln)
chol: all characters numeric; choln generated as int

Now type list chol choln in 1/5.

. list chol choln in 1/5

chol choln
1. 280 280
2. 280 280
3. 219 219
4. 198 198
5. 231 231

The data look the same, but we can use descibe to verify that choln is stored as an "int" numeric variable. You can type help data_types to learn more about different types of numeric data.

. describe chol choln

Variable      Storage   Display    Value
    name         type    format    label      Variable label
chol str3 %9s serum cholesterol (mg/dL)
choln int %10.0g serum cholesterol (mg/dL)

We can also type summarize chol choln to verify that choln works with Stata's statistical features.

. summarize chol choln

Variable Obs Mean Std. dev. Min Max
chol 0
choln 1,268 216.5418 46.88068 89 426

Sometimes, numeric data include symbols such as "%" or "$". You can tell destring to ignore these symbols using the ignore() option. Note that there is a related command named tostring that converts numeric data to string data. Let's convert choln back to a string to see how it works.

. tostring choln, gen(chols)
chols generated as str3

Now let's list and describe the three variables to check our work.

. list chol choln chols in 1/5

. describe chol choln chols

Variable      Storage   Display    Value
    name         type    format    label      Variable label
chol str3 %9s serum cholesterol (mg/dL)
choln int %10.0g serum cholesterol (mg/dL)
chols str3 %9s serum cholesterol (mg/dL)

The raw data look the same for all three variables, but, as we have learned, the storage type is important. And now we know how to convert between types when necessary.

You can watch a demonstration of these commands by clicking on the link to the YouTube video below. You can read more about these commands by clicking on the links to the Stata manual entries below.