To demonstrate how to use the magic commands, we first configure the pystata Python package by using the first method listed in Configuration. In the first method, the configuration module stata_setup, which is available in the Python Package Index (PyPI), is provided to locate the pystata package to initialize Stata.

Suppose we have Stata installed in C:\Program Files\Stata17\ and we use the Stata/MP edition. In this case, Stata can be initialized as follows:

[1]:
import stata_setup
stata_setup.config("C:/Program Files/Stata17/", "mp")

  ___  ____  ____  ____  ____ ©
 /__    /   ____/   /   ____/      17.0
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2021 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        stata@stata.com

Stata license: 10-user 4-core network perpetual
Serial number: 1
  Licensed to: Stata Developer
               StataCorp LLC

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000; see help set_maxvar.

The stata magic

The stata magic is used to execute Stata commands. It can be used as both a cell magic and a line magic command. You can type %%stata? to view its documentation.

[2]:
%%stata?

Docstring:

Execute one line or a block of Stata commands.

When the line magic command %stata is used, a one-line Stata command can be specified and executed, as it would be in Stata’s Command window. When the cell magic command %%stata is used, a block of Stata commands can be specified and executed all at once. This is similar to executing a series of commands from a do-file.

Cell magic syntax:

%%stata [-d DATA] [-f DFLIST|ARRLIST] [-force]
 [-doutd DATAFRAME] [-douta ARRAY] [-foutd FRAMELIST] [-fouta FRAMELIST]
 [-ret DICTIONARY] [-eret DICTIONARY] [-sret DICTIONARY] [-qui] [-nogr]
 [-gw WIDTH] [-gh HEIGHT]

Optional arguments:

  -d DATA               Load a NumPy array or pandas DataFrame
                        into Stata as the current working dataset.

  -f DFLIST|ARRLIST     Load one or multiple NumPy arrays or
                        pandas DataFrames into Stata as frames.
                        The arrays and DataFrames should be
                        separated by commas. Each array or
                        DataFrame is stored in Stata as a separate
                        frame with the same name.

  -force                Force loading of the NumPy array or pandas
                        DataFrame into Stata as the current working
                        dataset, even if the dataset in memory has
                        changed since it was last saved; or force
                        loading of the NumPy arrays or pandas DataFrames
                        into Stata as separate frames even if one or
                        more of the frames already exist in Stata.

  -doutd DATAFRAME      Save the dataset in memory as a pandas
                        DataFrame when the cell completes.

  -douta ARRAY          Save the dataset in memory as a NumPy
                        array when the cell completes.

  -foutd FRAMELIST      Save one or multiple Stata frames as pandas
                        DataFrames when the cell completes. The Stata
                        frames should be separated by commas. Each
                        frame is stored in Python as a pandas
                        DataFrame. The variable names in each frame
                        will be used as the column names in the
                        corresponding DataFrame.

  -fouta FRAMELIST      Save one or multiple Stata frames as NumPy
                        arrays when the cell completes. The Stata frames
                        should be separated by commas. Each frame is
                        stored in Python as a NumPy array.

  -ret DICTIONARY       Store current r() results into a dictionary.

  -eret DICTIONARY      Store current e() results into a dictionary.

  -sret DICTIONARY      Store current s() results into a dictionary.

  -qui                  Run Stata commands but suppress output.

  -nogr                 Do not display Stata graphics.

  -gw WIDTH             Set graph width in inches, pixels, or centimeters;
                        default is inches.

  -gh HEIGHT            Set graph height in inches, pixels, or centimeters;
                        default is inches.

Line magic syntax:

%stata stata_cmd

%%stata cell magic

The %%stata magic is used to execute Stata code within a cell.

We load the auto dataset for demonstration.

[3]:
%%stata
sysuse auto, clear
(1978 automobile data)

The first line in the cell is %%stata, which indicates that stata is used as a cell magic. In this line, one or more arguments can be specified to control the execution of the cell. Starting from the second line, one or multiple Stata commands can be specified.

A one-line command will be executed in single-line mode, meaning that only the output will be displayed. When specifying multiple commands, the block of code will be executed from a temporary do-file. This means the cell will respect notation allowed in a do-file, such as comments and delimiters. In this multi-line mode, Stata commands will be displayed together with the output.

[4]:
%%stata
/*
Describe the contents of the data
*/
describe

// summarize the variable mpg
summarize mpg

. /*
> Describe the contents of the data
> */
. describe

Contains data from C:\Program Files\Stata17/ado\base/a/auto.dta
 Observations:            74                  1978 automobile data
    Variables:            12                  13 Apr 2020 17:45
                                              (_dta has notes)
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear ratio
foreign         byte    %8.0g      origin     Car origin
-------------------------------------------------------------------------------
Sorted by: foreign

.
. // summarize the variable mpg
. summarize mpg

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         74     21.2973    5.785503         12         41

.

The %%stata magic keeps the state of each cell. In other words, the results generated from the previous cell can be accessed from the succeeding cell. For example, in the next cell we can access the mean of mpg, which is stored in r(mean).

[5]:
%%stata
display as text "mean of mpg = " as result r(mean)

local x "This is a local macro"

. display as text "mean of mpg = " as result r(mean)
mean of mpg = 21.297297

.
. local x "This is a local macro"

.

We can also access the values of macros defined in the previous cell.

[6]:
%%stata
display "`x'"
This is a local macro

Arguments

The cell magic %%stata provides arguments to control the execution of Stata’s commands within the cell. With these arguments, for example, you can load data from Python into Stata, perform computations or estimation with Stata, and then pass Stata results back to Python, or vice versa. You can specify multiple arguments at once.

-ret DICTIONARY

-eret DICTIONARY

-sret DICTIONARY

These arguments push Stata’s current r(), e(), and s() results into Python as a dictionary. The keys are Stata’s macro and scalar names, and the values are their corresponding values. Stata’s matrices are converted into NumPy arrays.

In the following cell, we first run a linear regression and list the e() stored results. Then we store these results in Python as a dictionary named myeret.

[7]:
%%stata -eret myeret
reg mpg price i.foreign
ereturn list

. reg mpg price i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     23.01
       Model |  960.866305         2  480.433152   Prob > F        =    0.0000
    Residual |  1482.59315        71  20.8815937   R-squared       =    0.3932
-------------+----------------------------------   Adj R-squared   =    0.3761
       Total |  2443.45946        73  33.4720474   Root MSE        =    4.5696

------------------------------------------------------------------------------
         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   -.000959   .0001815    -5.28   0.000     -.001321    -.000597
             |
     foreign |
    Foreign  |   5.245271   1.163592     4.51   0.000     2.925135    7.565407
       _cons |   25.65058   1.271581    20.17   0.000     23.11512    28.18605
------------------------------------------------------------------------------

. ereturn list

scalars:
                  e(N) =  74
               e(df_m) =  2
               e(df_r) =  71
                  e(F) =  23.00749448574634
                 e(r2) =  .3932401256962295
               e(rmse) =  4.569638248831391
                e(mss) =  960.8663049714787
                e(rss) =  1482.593154487981
               e(r2_a) =  .3761482982510528
                 e(ll) =  -215.9083177127538
               e(ll_0) =  -234.3943376482347
               e(rank) =  3

macros:
            e(cmdline) : "regress mpg price i.foreign"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "mpg"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

matrices:
                  e(b) :  1 x 4
                  e(V) :  4 x 4

functions:
             e(sample)

.

Here are the contents of the myeret dictionary.

[8]:
myeret
[8]:
{'e(N)': 74.0,
 'e(df_m)': 2.0,
 'e(df_r)': 71.0,
 'e(F)': 23.007494485746342,
 'e(r2)': 0.39324012569622946,
 'e(rmse)': 4.569638248831391,
 'e(mss)': 960.8663049714787,
 'e(rss)': 1482.5931544879809,
 'e(r2_a)': 0.3761482982510528,
 'e(ll)': -215.90831771275379,
 'e(ll_0)': -234.39433764823468,
 'e(rank)': 3.0,
 'e(cmdline)': 'regress mpg price i.foreign',
 'e(title)': 'Linear regression',
 'e(marginsprop)': 'minus',
 'e(marginsok)': 'XB default',
 'e(vce)': 'ols',
 'e(depvar)': 'mpg',
 'e(cmd)': 'regress',
 'e(properties)': 'b V',
 'e(predict)': 'regres_p',
 'e(model)': 'ols',
 'e(estat_cmd)': 'regress_estat',
 'e(b)': array([[-9.59034169e-04,  0.00000000e+00,  5.24527100e+00,
          2.56505843e+01]]),
 'e(V)': array([[ 3.29592449e-08,  0.00000000e+00, -1.02918123e-05,
         -2.00142479e-04],
        [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         -0.00000000e+00],
        [-1.02918123e-05,  0.00000000e+00,  1.35394617e+00,
         -3.39072871e-01],
        [-2.00142479e-04, -0.00000000e+00, -3.39072871e-01,
          1.61691892e+00]])}

When the command you specified in the cell generates a Stata graph, the graph will be automatically displayed in the IPython environment, such as Jupyter Notebook.

[9]:
%%stata
scatter mpg price
../_images/notebook_Magic_Commands1_19_0.svg

If multiple graphs are generated by the commands you issued from the cell, by default only the last one is displayed in the output. If you want to display them all, specify the name() option with Stata’s graphics commands.

[10]:
%%stata
scatter mpg price, name(a, replace)
histogram rep78, name(b, replace)

. scatter mpg price, name(a, replace)

. histogram rep78, name(b, replace)
(bin=8, start=1, width=.5)

.
../_images/notebook_Magic_Commands1_21_1.svg
../_images/notebook_Magic_Commands1_21_2.svg

Note

The graphs will be displayed after all the Stata commands within the cell have been executed, not immediately after the command that generated each graph.

This is partly because the total number of graphs created, as well as which graph should be displayed, is unknown until the cell has finished executing. It is also because the graphs are actually displayed within Python even though they are created in Stata. So, displaying the graph as soon as it is created by Stata would prevent the execution of the remaining Stata commands in the cell, and it would greatly increase the overhead of execution.

-nogr

By default, the graph generated by Stata will be displayed in the output. The -nogr argument suppresses the graph from the output, providing a slight reduction in the cell’s execution time. This is useful if you just want to save the Stata graph to disk instead of displaying it.

[11]:
%%stata -nogr
scatter mpg price, name(a, replace)
graph export a.png, replace

. scatter mpg price, name(a, replace)

. graph export a.png, replace
(file a.png not found)
file a.png written in PNG format

.

-gw WIDTH[in|px|cm]

-gh HEIGHT[in|px|cm]

By default, Stata’s graphics are displayed using the graph size, which is a 5.5-inch width and 4-inch height by default. These two arguments can be used to customize the dimensions of Stata’s graphs. You can specify the width or height or both. If only one is specified, the other dimension is calculated by the aspect ratio. The dimension can be specified in inches (the default), pixels, or centimeters.

[12]:
%%stata -gw 7in
scatter mpg price
../_images/notebook_Magic_Commands1_26_0.svg

The graphics arguments -nogr, -gw, and -gh only apply to the current cell. To set the graphics dimensions permanently, use the %pystata magic command.

-qui

The -qui argument specifies to execute Stata commands but suppress the output. This is equivalent to specifying quietly in front of the command. Note that this setting will be modified if the Stata command is prefixed with noisily.

[13]:
%%stata -qui
regress mpg price i.foreign

The next few arguments are used to load data from Python into Stata and vice versa. For information about loading data using the API functions, see the stata module.

-d DATA

This argument pushes a pandas DataFrame or a NumPy array to Stata, making it the current dataset in memory.

For a pandas DataFrame, the column names will be used as the variable names. If the column name is a valid Stata name, the name is used as is. On the other hand, if the column name is not a valid Stata name, a valid variable name is created using the makeVarName() method of the SFIToolkit class in the Stata Function Interface (sfi) module. When the column type of the DataFrame conforms to a Stata numeric variable type, this variable type will be used in Stata; otherwise, the column of the DataFrame will be converted into a string variable in Stata.

When a NumPy array is specified as the input dataset, v1, v2, … are used as the variable names in Stata. The variable types follow the same rule as above.

[14]:
# create a pandas DataFrame from a Python dictionary
import pandas as pd

data = {'Name':['James', 'Jack', 'Tom', 'George'],
        'Age':[25, 31, 25, 37]}

df = pd.DataFrame(data)
df
[14]:
Name Age
0 James 25
1 Jack 31
2 Tom 25
3 George 37
[15]:
%%stata -d df
list

     +--------------+
     |   Name   Age |
     |--------------|
  1. |  James    25 |
  2. |   Jack    31 |
  3. |    Tom    25 |
  4. | George    37 |
     +--------------+
[16]:
# generate a random NumPy array with shape 1000x5
import numpy as np

np.random.seed(17)
npa = np.random.random((1000,5))
[17]:
%%stata -d npa -force
summarize

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          v1 |      1,000    .5105258     .286903   .0000563   .9987037
          v2 |      1,000    .5061361    .2873697   .0012103   .9997834
          v3 |      1,000    .5002623    .2841705   .0004561   .9987615
          v4 |      1,000    .4889184    .2877739    .000095   .9997973
          v5 |      1,000     .502372    .2892453   .0023315   .9995737

In the above example, we loaded a NumPy array into Stata’s memory, making it the current dataset. We also specified the -force argument to clear the dataset in memory, which was previously loaded from a pandas DataFrame. See the -force argument for more information.

Because the NumPy array does not have column names, v1, v2, … are used as the variable names in Stata. But you can rename the variables with Stata’s rename command if you like.

The following two arguments are used to push the current dataset in memory to Python. This is helpful when you want to use Stata’s data management capabilities or you need Stata to generate some intermediate results, such as postestimation results.

-doutd DATAFRAME

This argument pushes Stata’s current dataset in memory to Python as a pandas DataFrame once the cell has finished executing. Stata’s variable names will be used as the column names in the DataFrame.

-douta ARRAY

This argument pushes Stata’s current dataset in memory to Python as a NumPy array once the cell has finished executing.

[18]:
%%stata -doutd df2
rename v* newv*

In the above cell, we first rename all the variables in memory, prefixing them with new, and then we export the dataset to Python as a pandas DataFrame named df2. Here are the first five rows of this frame.

[19]:
df2.head()
[19]:
newv1 newv2 newv3 newv4 newv5
0 0.294665 0.530587 0.191521 0.067900 0.786985
1 0.656334 0.637521 0.575603 0.039063 0.357814
2 0.945683 0.060045 0.864042 0.877291 0.051194
3 0.652419 0.551751 0.597513 0.483529 0.282988
4 0.297726 0.561509 0.396047 0.788701 0.418484

The following arguments are used to load Python’s data into Stata frames, and vice versa.

-f DFLIST|ARRLIST

This argument pushes one or multiple pandas DataFrames or NumPy arrays to Stata as frames. In Stata, the frames are created using the names of the DataFrames or arrays. The variable names and types follow the same rules as above. To push multiple DataFrames or arrays, specify the list separated by commas. There should be no whitespace between the names unless you enclose the list within single quotes () or double quotes ().

-foutd FRAMELIST

This argument pushes Stata’s frames in memory to Python as pandas DataFrames when the cell finishes executing. One or multiple frames can be specified. To push multiple frames, specify the list separated by commas. There should be no whitespace between the names unless you enclose the list within single quotes () or double quotes ().

-fouta FRAMELIST

This argument pushes Stata’s frames in memory to Python as NumPy arrays when the cell finishes executing. One or multiple frames can be specified. To push multiple frames, specify the list separated by commas. There should be no whitespace between the names unless you enclose the list within single quotes () or double quotes ().

In the following cell, we load the two pandas DataFrames created above, df and df2, into Stata as frames with the same names.

[20]:
%%stata -f df,df2
frame change df
list
frame change df2
list in 1/5

. frame change df

. list

     +--------------+
     |   Name   Age |
     |--------------|
  1. |  James    25 |
  2. |   Jack    31 |
  3. |    Tom    25 |
  4. | George    37 |
     +--------------+

. frame change df2

. list in 1/5

     +-----------------------------------------------------------+
     |     newv1       newv2       newv3       newv4       newv5 |
     |-----------------------------------------------------------|
  1. |   .294665   .53058676   .19152079   .06790036   .78698546 |
  2. | .65633352    .6375209   .57560289   .03906292    .3578136 |
  3. | .94568319   .06004468    .8640421   .87729053   .05119367 |
  4. | .65241862   .55175137   .59751325   .48352862   .28298816 |
  5. | .29772572   .56150891   .39604744   .78870071   .41848439 |
     +-----------------------------------------------------------+

.

-force

You cannot load a Python dataset into Stata if Stata’s dataset in memory has been changed since it was last saved. To force loading of the dataset, replacing the dataset in memory, use the -force argument with -d.

This argument can also be used with the -f argument when trying to load a pandas DataFrame or a NumPy array as a Stata frame, if the frame already exists in Stata. This argument will force loading into the frame, replacing the existing frame.

%stata line magic

The %stata line magic provides users a quick way to execute a single-line Stata command. Unlike with the cell magic, with the %stata line magic, the Stata command is specified immediately following the magic, on the same line. Arguments are not allowed with the line magic; if you want to specify an argument, use the cell magic instead.

[21]:
%stata codebook newv1

-------------------------------------------------------------------------------
newv1                                                               (unlabeled)
-------------------------------------------------------------------------------

                  Type: Numeric (double)

                 Range: [.00005633,.99870374]         Units: 1.000e-12
         Unique values: 1,000                     Missing .: 0/1,000

                  Mean: .510526
             Std. dev.: .286903

           Percentiles:     10%       25%       50%       75%       90%
                        .101457   .272727   .515693   .755822   .893824
[22]:
%%stata
codebook newv1

-------------------------------------------------------------------------------
newv1                                                               (unlabeled)
-------------------------------------------------------------------------------

                  Type: Numeric (double)

                 Range: [.00005633,.99870374]         Units: 1.000e-12
         Unique values: 1,000                     Missing .: 0/1,000

                  Mean: .510526
             Std. dev.: .286903

           Percentiles:     10%       25%       50%       75%       90%
                        .101457   .272727   .515693   .755822   .893824