Home  /  Resources & Support  /  Introduction to Stata basics  /  How to create new variables

Sometimes, we wish to create variables that are functions of other variables. For example, we may need to calculate body mass index (BMI) using height and weight.

Let's begin by opening and describing an example dataset from the Stata website.

. use https://www.stata.com/users/youtube/rawdata.dta, clear
(Fictitious data based on the National Health and Nutrition Examination Survey)

. describe

Contains data from https://www.stata.com/users/youtube/rawdata.dta
 Observations:         1,268                  Fictitious data based on the
                                                National Health and Nutrition
                                                Examination Survey
    Variables:            10                  6 Jul 2016 11:17
                                              (_dta has notes)
Variable Storage Display Value name type format label Variable label
id str6 %9s Identification Number age byte %9.0g sex byte %9.0g Sex race str5 %9s Race height float %9.0g height (cm) weight float %9.0g weight (kg) sbp int %9.0g Systolic blood pressure (mm/Hg) dbp int %9.0g Diastolic blood pressure (mm/Hg) chol str3 %9s serum cholesterol (mg/dL) dob str18 %18s
Sorted by: id

The description tells us that the variable height is measured in centimeters (cm) and the variable weight is measured in kilograms (kg). We wish to calculate BMI, which is defined as weight in kilograms divided by the square of height measured in meters. Let's use Stata's generate command to create a new variable for height measured in meters. We simply divide height by 100 to convert centimeters to meters.

. generate heightm = height/100

Then we can create a variable for BMI using our new heightm variable.

. generate bmi = weight / heightm^2

Let's list the first five observations and summarize bmi to check our work.

. list weight height heightm bmi in 1/5

. summarize bmi

Variable Obs Mean Std. dev. Min Max
bmi 1,268 25.77892 5.241681 15.43519 53.11815

Note that we could have divided height by 100 and created the bmi variable with one generate command.

. generate bmi2 = weight / (height/100)^2

. summarize bmi bmi2

Variable Obs Mean Std. dev. Min Max
bmi 1,268 25.77892 5.241681 15.43519 53.11815
bmi2 1,268 25.77892 5.241681 15.43518 53.11815

We also could have used Stata's replace command to replace bmi rather than generate a second bmi variable.

. replace bmi2 = weight / (height/100)^2
(0 real changes made)

. summarize bmi bmi2
Variable Obs Mean Std. dev. Min Max
bmi 1,268 25.77892 5.241681 15.43519 53.11815
bmi2 1,268 25.77892 5.241681 15.43518 53.11815

You can watch a demonstration of these commands by clicking on the link to the YouTube video below. You can read more about these commands by clicking on the links to the Stata manual entries below.