Configuration

The pystata Python package allows you to call Stata from within Python. Below, we list the programs and packages you will need to use the pystata package, and then we discuss different methods you can use to configure it.

Requirements

To call Stata from within Python by using the pystata package, the following combination is needed:

  • Stata 17 or later versions

  • Python 2.7 or 3.4 or later versions

Dependencies

To use the pystata package with full functionality, the following Python packages will need to be installed:

  • NumPy 1.9 or later versions

  • pandas 0.15 or later versions

    The NumPy and pandas packages are not required if you only plan to execute Stata commands by invoking the run() method in the stata module.

    However, they are required to invoke the methods in the stata module used to pass data and results between Stata and Python.

  • IPython 5.0 or later versions

    The IPython package is required if you want to use the magic commands.

Configuration

The pystata Python package is shipped with Stata and located in the pystata subdirectory of the utilities folder in Stata’s installation directory. For example, if you install Stata in C:\Program Files\Stata18, then the pystata package will be located in the C:\Program Files\Stata18\utilities\pystata\ directory. The package is placed there for convenience, to avoid conflicts between official updates to Stata and updates to the pystata Python package. Stata’s installation directory is stored in the c(sysdir_stata) macro. You can type the following in Stata to view the name of this directory:

. display c(sysdir_stata)

When you try to import the pystata package in your Python environment, an exception will be raised claiming no module is named pystata. Python cannot locate it because the pystata package is stored in Stata’s installation directory, which is not on Python’s system module search path. (You can see sys.path, from Python’s sys module, to see the list of directories in this search path.)

There are, however, several ways to import the pystata package in the Python environment. Below, we show you four methods to configure the package. For simplicity, we will refer to the Stata installation directory as STATA_SYSDIR, meaning that the pystata subdirectory is located in the STATA_SYSDIR\utilities\ directory. When implementing one of the methods below, be sure to replace STATA_SYSDIR with the directory in which your copy of Stata is installed. If you get output similar to that shown below for your edition of Stata, it means that everything is configured properly.

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      18.0
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        [email protected]

Stata license: 10-user 4-core network perpetual
Serial number: 1
  Licensed to: Stata Developer
               StataCorp LLC

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

Otherwise, send your output to our technical support team at tech-support@stata.com.

Method 1: Installing via pip

To enable Python to find Stata’s installation path and the pystata package, we provide the Python module stata_setup. The config() function defined in the stata_setup module is used to locate the pystata package within this module. This function has two arguments: the first one is Stata’s installation path and the second one is the edition to use. The edition argument can be one of mp, se, or be, which represent Stata/MP, Stata/SE, and Stata/BE editions, respectively.

The simplest way to install this setup module is to use the Python package manager pip from the Python Package Index (PyPI). Open a Windows Command Prompt and type

> pip install --upgrade --user stata_setup

Or open a macOS or Unix terminal and type

$ pip install --upgrade --user stata_setup

This will install the stata_setup module and the dependencies for the pystata package.

The other way to install the stata_setup module is to download the source code, which is stata_setup-0.1.3.zip for Windows and stata_setup-0.1.3.tar.gz for Linux and Mac OS X.

After you download it to your local drive, change into that directory. In the Windows Command Prompt, type

> pip install stata_setup-0.1.3.zip

Or in a macOS or Unix terminal, type

$ pip install stata_setup-0.1.3.tar.gz

Suppose your Stata is installed in STATA_SYSDIR and you have the Stata/MP edition. You can configure Stata within the Python environment as follows:

>>> import stata_setup
>>> stata_setup.config('STATA_SYSDIR', 'mp')

If Stata is configured correctly, stata_setup.config() will return with the splash screen above with Stata’s logo and initialization message. To suppress these messages, set the splash argument to False, as follows:

>>> stata_setup.config('STATA_SYSDIR', 'mp', splash=False)

By default, splash is True. This argument is added in version 0.1.3.

Method 2: Adding pystata to sys.path

The most direct way to locate the pystata package is to add the pystata subdirectory’s location in Python’s module search path. In your Python environment, you can type

>>> import sys
>>> sys.path.append('STATA_SYSDIR/utilities')
>>> from pystata import config
>>> config.init('mp')

If it is configured correctly, config.init() should return with no error and the splash screen above with Stata’s logo and initialization message is displayed. If you want to suppress those message, you can set the splash argument to False. See The config module for more information.

Method 3: Changing your current working directory

In the Python environment, the current working directory is automatically on the module search path, so you can also locate the pystata package by changing your current working directory to STATA_SYSDIR\utilities\.

>>> import os
>>> os.chdir('STATA_SYSDIR/utilities')
>>> from pystata import config
>>> config.init('mp')

Method 4: Editing PYTHONPATH

PYTHONPATH is a Python environment variable storing a list of paths that are added to the default module search path when the Python environment is initialized. So, you can add STATA_SYSDIR\utilities to PYTHONPATH to locate the pystata package directly, without having to manipulate sys.path or change your current working directory. Note that you just need to configure PYTHONPATH once and STATA_SYSDIR\utilities will be loaded to the module search paths by default. This is more convenient than the two methods shown above, which would require you to manipulate sys.path or change your current working directory every time you want to import the pystata package in the Python environment.

Windows users can use the following steps to set this environment variable:

  • For Windows 10 users, open the Control Panel, click on the System and Security link, and then click on the System link. Then click on the Advanced system settings link and select the Environment Variables… button. The process may be different with other Windows systems.

  • Under the User variables section for your login ID, click on New…, enter PYTHONPATH for the variable name, and specify STATA_SYSDIR\utilities for the variable value.

  • Click on OK to close the New User Variable window, OK to close the Environment Variables window, and OK again to close the System Properties window.

For Linux and Mac OS X users, you can set it permanently in your ~/.bashrc or ~/.bash_profile file,

$ export PYTHONPATH=STATA_SYSDIR/utilities:$PYTHONPATH

or in your ~/.cshrc file,

$ setenv PYTHONPATH STATA_SYSDIR\utilities:${PYTHONPATH}

After you are done, you can check whether it was set successfully by typing in the Windows Command Prompt

> echo %PYTHONPATH%

or in a macOS or Unix terminal

$ echo $PYTHONPATH

Next, in your Python environment, you can type

>>> from pystata import config
>>> config.init('mp')

to check whether the pystata package was located and Stata was initialized successfully.

Note

In contrast to the first configuration method, if you use any of the last three configuration methods, you will have to install the numpy, pandas, and ipython dependencies yourself. If you installed Python using a prepackaged distribution, you may already have them installed. If not, you can install them via the Windows Command Prompt by typing

> pip install --upgrade --user numpy pandas ipython

or via a macOS or Unix terminal by typing

$ pip install --upgrade --user numpy pandas ipython

See pip - The Python Package Installer for more information about installing Python packages.