Use Stata from within Python
Stata API functions to run Stata commands and access Stata data and returned results from Python
IPython magic command to use Stata from Jupyter Notebook
PyStata allows you to invoke Stata directly from any standalone Python environment and to call Python directly from Stata, thus, greatly expanding Stata's Python integration features.
Features in PyStata include
the ability to use Stata from an IPython kernel-based environment like Jupyter Notebook, Spyder IDE, or PyCharm IDE;
the ability to use Stata from Python Shell, like the Windows Command Prompt, the macOS terminal, or the Unix terminal;
four IPython magic commands: stata, mata, pystata, and, in StataNow™, help;
a suite of API functions from within Python to run Stata commands and access Stata data and returned results.
These tools, together with the Stata Function Interface (sfi) module, allow users to easily integrate Stata's vast statistical and data management methods into any data science project using Python.
Imagine that a health provider is interested in studying the effect of a new hospital admissions procedure on patient satisfaction. They have monthly data on patients before and after the new procedure was implemented in some of their hospitals. The data are in nested JSON format, and the health provider uses Python as the data analysis tool. But they would like to use Stata's new DID regression to analyze the effect of the new admissions procedure on the hospitals that participated in the program. The outcome of interest is patient satisfaction, satisfaction_score, and the treatment variable is procedure.
A portion of did.json is
{
"hospital_id": "1",
"month": "7",
"records": [
{
"procedure": "New",
"satisfaction_score": "4.1065269"
}
]
}
We use the API function in a Python script, did.py, to interact with Stata. Some highlights of the code are
# Setup Stata from within Python
import stata_setup
stata_setup.config("C:/Program Files/Stata18", "se")
# Import the json file into a Python dataframe
with open("did.json") as json_file:
data = json.load(json_file)
data = json_normalize(data, 'records', ['hospital_id', 'month'])
# Load Python dataframe into Stata
from pystata import stata
stata.pdataframe_to_data(data, True)
# Run Stata commands in Python
stata.run('''
didregress (satisfaction_score) (procedure), ///
group(hospital_id) time(month)
''', echo=True)
# Load Stata saved results to Python
r = stata.get_return()['r(table)']
# Use them in Python
print("\n")
print("The treatment hospitals had a %5.2f-point increase." % (r[0][0]), end=" ")
print("The result is with 95%% confidence interval [%5.2f, %5.2f]." % (r[4][0], r[5][0]))
# Generate Stata graph in Python
stata.run("estat trendplots", echo=True)
stata.run("graph export did.svg, replace", quietly=True)
Here we run did.py, which was created in the above section, in Spyder.
The entire analysis is performed without leaving the Python environment. And with Stata's API functions, data and results flow seamlessly between Python and Stata.
The script can easily be executed in any Python environment, such as the Windows Command Prompt, the macOS terminal, or the Unix terminal. This method uses only the shell environment and does not invoke any GUI element of Stata.
python did.py > did.log
produces a log file, did.log, with output from didregress.
This method is useful for automating tasks in Windows. And the above script can be incorporated into a regularly scheduled task to handle new data.
For a detailed example using Stata in Jupyter Notebook or any Python environment that supports IPython, see Jupyter Notebook with Stata.
Learn more about using Python and Stata together.
See difference in differences (DID) and difference in difference in differences (DDD).