Home  /  Stata News  /  Vol 35 No 3  /  In the spotlight: Using Python within Stata
The Stata News

«Back to main page

In the spotlight: Using Python within Stata

Did you know that you can use Python within Stata? Well, you can—you can use Python from within Stata to access data with Internet API calls, process JSON data, create three-dimensional graphics, use machine learning and artificial intelligence algorithms to find patterns in your data, and much more. And the best part is that you can seamlessly move between Stata and Python to perform tasks such as these right alongside the analyses that you already perform using Stata.

I recently started writing a series of blog posts to help you get started using Python from within Stata. The first post shows you how to set up Stata to use Python, and the other posts help you get started with Python and then demonstrate interesting ways you can use it. Before you read the blog posts, I want to show you one way that you can take advantage of the Python integration in Stata. Here I use Stata to estimate marginal predictions from a logistic regression model and then Python to create a three-dimensional surface plot of those predictions.

I first fit a logistic regression model using data from the National Health and Nutrition Examination Survey (NHANES). The dependent variable, highbp, is an indicator for high blood pressure, and I included the main effects and interaction of the continuous variables age and weight.


. webuse nhanes2, clear

. svy: logistic highbp c.age##c.weight
(running logistic on estimation sample

Survey: Logistic regression

Number of strata   =        31                Number of obs     =       10,351
Number of PSUs     =        62                Population size   =  117,157,513
                                              Design df         =           31
                                              F(   3,     29)   =       418.97
                                              Prob > F          =       0.0000

  Linearized
highbp Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
age 1.100678 .0088786 11.89 0.000 1.082718 1.118935
weight 1.07534 .0063892 12.23 0.000 1.062388 1.08845
c.age#c.weight .9993975 .0001138 -5.29 0.000 .9991655 .9996296
_cons .0002925 .0001194 -19.94 0.000 .0001273 .0006724
Note: _cons estimates baseline odds.

The estimated odds ratio for the interaction of age and weight is 0.9994, and the p-value for the estimate is 0.000. Interpreting this result is challenging because the odds ratio is essentially equal to the null value of 1, but the p-value is essentially 0. Is the interaction effect meaningful? It is often easier to determine this by looking at the predicted probability of the outcome at different levels of the covariates rather than looking only at this odds ratio.

The code block below uses margins to estimate the predicted probability of hypertension for all combinations of age and weight for values of age ranging from 20 to 80 years in increments of 5 and for values of weight ranging from 40 to 180 kilograms in increments of 5. The option saving(predictions, replace) saves the predictions to a dataset named predictions. I then use the dataset predictions, rename three variables in the dataset, and save it again.


webuse nhanes2, clear
svy: logistic highbp age weight c.age#c.weight
quietly margins, at(age=(20(5)80) weight=(40(5)180)) 
                 vce(unconditional) saving(predictions, replace)
use predictions, clear
rename _at1 age
rename _at2 weight
rename _margin pr_highbp
save predictions, replace

In a previous Stata News article, I used graph twoway contour to create a contour plot of the predicted probability of hypertension saved in predictions. Today, I'm going to use Python to create a three-dimensional surface plot of the predicted probability.

You can run the Python code below in a Stata do-file after you have set up Stata to use Python. The Python code block begins with python: and ends with end. I have included comments in the Python code to give you clues about the purpose of each collection of Python statements. You can read a detailed explanation of the Python code at the Stata Blog.


python:
# Import the necessary Python packages
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')

# Read (import) the Stata dataset "predictions.dta" 
# into a pandas data frame named "data"
data = pd.read_stata("predictions.dta")

# Define a 3-D graph named "ax"
ax = plt.axes(projection='3d')

# Specify the axis ticks
ax.set_xticks(np.arange(20, 100, step=10))
ax.set_yticks(np.arange(40, 220, step=40))
ax.set_zticks(np.arange( 0,   1, step=0.2))

# Specify the title and axis titles
ax.set_title("Probability of Hypertension by Age and Weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Weight (kg)")
ax.zaxis.set_rotate_label(False)
ax.set_zlabel("Probability of Hypertension", rotation=90)

# Specify the view angle of the graph
ax.view_init(elev=30, azim=240)

# Render the graph
ax.plot_trisurf(data['age'], data['weight'], data['pr_highbp'], 
    cmap=plt.cm.Spectral_r, 
    edgecolor='none')

# Save the graph    
plt.savefig("Margins3d_3.png", dpi=1200)
end

The resulting three-dimensional surface plot shows the predicted probability of hypertension for values of age and weight and allows us to visualize their interaction. The probabilities are indicated by the height of the surface on the z axis and by the color of the surface. Blue indicates lower probabilities of hypertension, and red indicates higher probabilities of hypertension.

All the Stata commands and Python code in this example can be run in the same do-file. And three-dimensional surface plots are just one of the many fun things you can do with Stata and Python.

Ready to try it for yourself? My blog posts will show you how to get started:

  1. Setting up Stata to use Python
  2. Three ways to use Python in Stata
  3. How to install Python packages
  4. How to use Python packages
  5. Three-dimensional surface plots of marginal predictions
  6. Using APIs and JSON data with Python
  7. Machine learning and artificial intelligence with Python

Stay tuned into the Stata Blog for more posts on exciting ways to use Python within Stata. And even better, explore ways to take advantage of Stata/Python integration in your own research.

— Chuck Huber
Director of Statistical Outreach

«Back to main page