Home  /  Resources & Support  /  Introduction to Stata basics  /  Basic scatterplots

Scatterplots are a popular tool used to visualize the relationship between two continuous variables. You can use Stata's graph twoway scatter command to create simple scatterplots, or you can add options to make more sophisticated charts.

Let's begin by opening the nhanes2l dataset.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

We can use twoway scatter to create a simple scatterplot with age on the horizontal axis and body mass index (bmi) on the vertical axis.

. twoway (scatter bmi age)

Next, let's add a title to our graph. Note that I'm using the “triple slash” to write my command across two lines. You can't do this in the Command window, but it is useful when writing long graph commands in do-files.

. twoway (scatter bmi age),                                     ///
     title("Scatterplot of age (years) and body mass index (BMI)")

Let's change the markers to tiny, green triangles.

. twoway (scatter bmi age,                                       ///
     mcolor(green) msize(tiny) msymbol(triangle)),               ///
     title("Scatterplot of age (years) and body mass index (BMI)")

We can use the if option to create a scatterplot for a subset of our sample. Let's type label list diabetes to view the categories of diabetes.

. label list diabetes
diabetes:
           0 Not diabetic
           1 Diabetic

I've added the option if diabetes==0 to the scatterplot below so that we plot only the observations for people without diabetes.

. twoway (scatter bmi age if diabetes==0,                        ///
     mcolor(green) msize(tiny) msymbol(triangle)),               ///
     title("Scatterplot of age (years) and body mass index (BMI)")

Let's add a second scatterplot for people with diabetes using medium, red circles.

. twoway (scatter bmi age if diabetes==0,                           ///
     mcolor(green) msize(tiny) msymbol(triangle))                  ///
     (scatter bmi age if diabetes==1,                              ///
     mcolor(red) msize(small) msymbol(circle)),                   ///
     title("Scatterplot of age (years) and body mass index (BMI)")

Next, let's customize the legend so that we know which symbol represents which group.

. twoway (scatter bmi age if diabetes==0,                             ///
     mcolor(green) msize(tiny) msymbol(triangle))                    ///
     (scatter bmi age if diabetes==1,                                ///
     mcolor(red) msize(small) msymbol(circle)),                     ///
     title("Scatterplot of age (years) and body mass index (BMI)") ///
     legend(order(1 "No Diabetes" 2 "Diabetes")                    ///
     rows(1) position(12)) 

There are many other options that you can use to customize your scatterplots, and you can read about them in the manual. You can also watch a demonstration of these commands by clicking on the link to the YouTube video below.

See it in action

Watch Basic scatterplots in Stata.