To demonstrate how to use the magic commands, we first configure the pystata Python package by using the first method listed in Configuration. In the first method, the configuration module stata_setup, which is available in the Python Package Index (PyPI), is provided to locate the pystata package to initialize Stata.
Suppose we have Stata installed in C:\Program Files\Stata17\ and we use the Stata/MP edition. In this case, Stata can be initialized as follows:
[1]:
import stata_setup
stata_setup.config("C:/Program Files/Stata17/", "mp")
___ ____ ____ ____ ____ ©
/__ / ____/ / ____/ 17.0
___/ / /___/ / /___/ MP—Parallel Edition
Statistics and Data Science Copyright 19852021 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800STATAPC https://www.stata.com
9796964600 stata@stata.com
Stata license: 10user 4core network perpetual
Serial number: 1
Licensed to: Stata Developer
StataCorp LLC
Notes:
1. Unicode is supported; see help unicode_advice.
2. More than 2 billion observations are allowed; see help obs_advice.
3. Maximum number of variables is set to 5,000; see help set_maxvar.
The stata magic¶
The stata magic is used to execute Stata commands. It can be used as both a cell magic and a line magic command. You can type %%stata? to view its documentation.
[2]:
%%stata?
Docstring:
Execute one line or a block of Stata commands.
When the line magic command %stata is used, a oneline Stata command can be specified and executed, as it would be in Stata’s Command window. When the cell magic command %%stata is used, a block of Stata commands can be specified and executed all at once. This is similar to executing a series of commands from a dofile.
Cell magic syntax:
%%stata [d DATA] [f DFLISTARRLIST] [force]
[doutd DATAFRAME] [douta ARRAY] [foutd FRAMELIST] [fouta FRAMELIST]
[ret DICTIONARY] [eret DICTIONARY] [sret DICTIONARY] [qui] [nogr]
[gw WIDTH] [gh HEIGHT]
Optional arguments:
d DATA Load a NumPy array or pandas DataFrame
into Stata as the current working dataset.
f DFLISTARRLIST Load one or multiple NumPy arrays or
pandas DataFrames into Stata as frames.
The arrays and DataFrames should be
separated by commas. Each array or
DataFrame is stored in Stata as a separate
frame with the same name.
force Force loading of the NumPy array or pandas
DataFrame into Stata as the current working
dataset, even if the dataset in memory has
changed since it was last saved; or force
loading of the NumPy arrays or pandas DataFrames
into Stata as separate frames even if one or
more of the frames already exist in Stata.
doutd DATAFRAME Save the dataset in memory as a pandas
DataFrame when the cell completes.
douta ARRAY Save the dataset in memory as a NumPy
array when the cell completes.
foutd FRAMELIST Save one or multiple Stata frames as pandas
DataFrames when the cell completes. The Stata
frames should be separated by commas. Each
frame is stored in Python as a pandas
DataFrame. The variable names in each frame
will be used as the column names in the
corresponding DataFrame.
fouta FRAMELIST Save one or multiple Stata frames as NumPy
arrays when the cell completes. The Stata frames
should be separated by commas. Each frame is
stored in Python as a NumPy array.
ret DICTIONARY Store current r() results into a dictionary.
eret DICTIONARY Store current e() results into a dictionary.
sret DICTIONARY Store current s() results into a dictionary.
qui Run Stata commands but suppress output.
nogr Do not display Stata graphics.
gw WIDTH Set graph width in inches, pixels, or centimeters;
default is inches.
gh HEIGHT Set graph height in inches, pixels, or centimeters;
default is inches.
Line magic syntax:
%stata stata_cmd
%%stata cell magic¶
The %%stata magic is used to execute Stata code within a cell.
We load the auto dataset for demonstration.
[3]:
%%stata
sysuse auto, clear
(1978 automobile data)
The first line in the cell is %%stata, which indicates that stata is used as a cell magic. In this line, one or more arguments can be specified to control the execution of the cell. Starting from the second line, one or multiple Stata commands can be specified.
A oneline command will be executed in singleline mode, meaning that only the output will be displayed. When specifying multiple commands, the block of code will be executed from a temporary dofile. This means the cell will respect notation allowed in a dofile, such as comments and delimiters. In this multiline mode, Stata commands will be displayed together with the output.
[4]:
%%stata
/*
Describe the contents of the data
*/
describe
// summarize the variable mpg
summarize mpg
. /*
> Describe the contents of the data
> */
. describe
Contains data from C:\Program Files\Stata17/ado\base/a/auto.dta
Observations: 74 1978 automobile data
Variables: 12 13 Apr 2020 17:45
(_dta has notes)

Variable Storage Display Value
name type format label Variable label

make str18 %18s Make and model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear ratio
foreign byte %8.0g origin Car origin

Sorted by: foreign
.
. // summarize the variable mpg
. summarize mpg
Variable  Obs Mean Std. dev. Min Max
+
mpg  74 21.2973 5.785503 12 41
.
The %%stata magic keeps the state of each cell. In other words, the results generated from the previous cell can be accessed from the succeeding cell. For example, in the next cell we can access the mean of mpg, which is stored in r(mean).
[5]:
%%stata
display as text "mean of mpg = " as result r(mean)
local x "This is a local macro"
. display as text "mean of mpg = " as result r(mean)
mean of mpg = 21.297297
.
. local x "This is a local macro"
.
We can also access the values of macros defined in the previous cell.
[6]:
%%stata
display "`x'"
This is a local macro
Arguments¶
The cell magic %%stata provides arguments to control the execution of Stata’s commands within the cell. With these arguments, for example, you can load data from Python into Stata, perform computations or estimation with Stata, and then pass Stata results back to Python, or vice versa. You can specify multiple arguments at once.
ret DICTIONARY
eret DICTIONARY
sret DICTIONARY
These arguments push Stata’s current r(), e(), and s() results into Python as a dictionary. The keys are Stata’s macro and scalar names, and the values are their corresponding values. Stata’s matrices are converted into NumPy arrays.
In the following cell, we first run a linear regression and list the e() stored results. Then we store these results in Python as a dictionary named myeret.
[7]:
%%stata eret myeret
reg mpg price i.foreign
ereturn list
. reg mpg price i.foreign
Source  SS df MS Number of obs = 74
+ F(2, 71) = 23.01
Model  960.866305 2 480.433152 Prob > F = 0.0000
Residual  1482.59315 71 20.8815937 Rsquared = 0.3932
+ Adj Rsquared = 0.3761
Total  2443.45946 73 33.4720474 Root MSE = 4.5696

mpg  Coefficient Std. err. t P>t [95% conf. interval]
+
price  .000959 .0001815 5.28 0.000 .001321 .000597

foreign 
Foreign  5.245271 1.163592 4.51 0.000 2.925135 7.565407
_cons  25.65058 1.271581 20.17 0.000 23.11512 28.18605

. ereturn list
scalars:
e(N) = 74
e(df_m) = 2
e(df_r) = 71
e(F) = 23.00749448574634
e(r2) = .3932401256962295
e(rmse) = 4.569638248831391
e(mss) = 960.8663049714787
e(rss) = 1482.593154487981
e(r2_a) = .3761482982510528
e(ll) = 215.9083177127538
e(ll_0) = 234.3943376482347
e(rank) = 3
macros:
e(cmdline) : "regress mpg price i.foreign"
e(title) : "Linear regression"
e(marginsok) : "XB default"
e(vce) : "ols"
e(depvar) : "mpg"
e(cmd) : "regress"
e(properties) : "b V"
e(predict) : "regres_p"
e(model) : "ols"
e(estat_cmd) : "regress_estat"
matrices:
e(b) : 1 x 4
e(V) : 4 x 4
functions:
e(sample)
.
Here are the contents of the myeret dictionary.
[8]:
myeret
[8]:
{'e(N)': 74.0,
'e(df_m)': 2.0,
'e(df_r)': 71.0,
'e(F)': 23.007494485746342,
'e(r2)': 0.39324012569622946,
'e(rmse)': 4.569638248831391,
'e(mss)': 960.8663049714787,
'e(rss)': 1482.5931544879809,
'e(r2_a)': 0.3761482982510528,
'e(ll)': 215.90831771275379,
'e(ll_0)': 234.39433764823468,
'e(rank)': 3.0,
'e(cmdline)': 'regress mpg price i.foreign',
'e(title)': 'Linear regression',
'e(marginsprop)': 'minus',
'e(marginsok)': 'XB default',
'e(vce)': 'ols',
'e(depvar)': 'mpg',
'e(cmd)': 'regress',
'e(properties)': 'b V',
'e(predict)': 'regres_p',
'e(model)': 'ols',
'e(estat_cmd)': 'regress_estat',
'e(b)': array([[9.59034169e04, 0.00000000e+00, 5.24527100e+00,
2.56505843e+01]]),
'e(V)': array([[ 3.29592449e08, 0.00000000e+00, 1.02918123e05,
2.00142479e04],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[1.02918123e05, 0.00000000e+00, 1.35394617e+00,
3.39072871e01],
[2.00142479e04, 0.00000000e+00, 3.39072871e01,
1.61691892e+00]])}
When the command you specified in the cell generates a Stata graph, the graph will be automatically displayed in the IPython environment, such as Jupyter Notebook.
[9]:
%%stata
scatter mpg price
If multiple graphs are generated by the commands you issued from the cell, by default only the last one is displayed in the output. If you want to display them all, specify the name() option with Stata’s graphics commands.
[10]:
%%stata
scatter mpg price, name(a, replace)
histogram rep78, name(b, replace)
. scatter mpg price, name(a, replace)
. histogram rep78, name(b, replace)
(bin=8, start=1, width=.5)
.
Note
The graphs will be displayed after all the Stata commands within the cell have been executed, not immediately after the command that generated each graph.
This is partly because the total number of graphs created, as well as which graph should be displayed, is unknown until the cell has finished executing. It is also because the graphs are actually displayed within Python even though they are created in Stata. So, displaying the graph as soon as it is created by Stata would prevent the execution of the remaining Stata commands in the cell, and it would greatly increase the overhead of execution.
nogr
By default, the graph generated by Stata will be displayed in the output. The nogr argument suppresses the graph from the output, providing a slight reduction in the cell’s execution time. This is useful if you just want to save the Stata graph to disk instead of displaying it.
[11]:
%%stata nogr
scatter mpg price, name(a, replace)
graph export a.png, replace
. scatter mpg price, name(a, replace)
. graph export a.png, replace
(file a.png not found)
file a.png written in PNG format
.
gw WIDTH[inpxcm]
gh HEIGHT[inpxcm]
By default, Stata’s graphics are displayed using the graph size, which is a 5.5inch width and 4inch height by default. These two arguments can be used to customize the dimensions of Stata’s graphs. You can specify the width or height or both. If only one is specified, the other dimension is calculated by the aspect ratio. The dimension can be specified in inches (the default), pixels, or centimeters.
[12]:
%%stata gw 7in
scatter mpg price
The graphics arguments nogr, gw, and gh only apply to the current cell. To set the graphics dimensions permanently, use the %pystata magic command.
qui
The qui argument specifies to execute Stata commands but suppress the output. This is equivalent to specifying quietly in front of the command. Note that this setting will be modified if the Stata command is prefixed with noisily.
[13]:
%%stata qui
regress mpg price i.foreign
The next few arguments are used to load data from Python into Stata and vice versa. For information about loading data using the API functions, see the stata module.
d DATA¶
This argument pushes a pandas DataFrame or a NumPy array to Stata, making it the current dataset in memory.
For a pandas DataFrame, the column names will be used as the variable names. If the column name is a valid Stata name, the name is used as is. On the other hand, if the column name is not a valid Stata name, a valid variable name is created using the makeVarName() method of the SFIToolkit class in the Stata Function Interface (sfi) module. When the column type of the DataFrame conforms to a Stata numeric variable type, this variable type will be used in Stata; otherwise, the column of the DataFrame will be converted into a string variable in Stata.
When a NumPy array is specified as the input dataset, v1, v2, … are used as the variable names in Stata. The variable types follow the same rule as above.
[14]:
# create a pandas DataFrame from a Python dictionary
import pandas as pd
data = {'Name':['James', 'Jack', 'Tom', 'George'],
'Age':[25, 31, 25, 37]}
df = pd.DataFrame(data)
df
[14]:
Name  Age  

0  James  25 
1  Jack  31 
2  Tom  25 
3  George  37 
[15]:
%%stata d df
list
++
 Name Age 

1.  James 25 
2.  Jack 31 
3.  Tom 25 
4.  George 37 
++
[16]:
# generate a random NumPy array with shape 1000x5
import numpy as np
np.random.seed(17)
npa = np.random.random((1000,5))
[17]:
%%stata d npa force
summarize
Variable  Obs Mean Std. dev. Min Max
+
v1  1,000 .5105258 .286903 .0000563 .9987037
v2  1,000 .5061361 .2873697 .0012103 .9997834
v3  1,000 .5002623 .2841705 .0004561 .9987615
v4  1,000 .4889184 .2877739 .000095 .9997973
v5  1,000 .502372 .2892453 .0023315 .9995737
In the above example, we loaded a NumPy array into Stata’s memory, making it the current dataset. We also specified the force argument to clear the dataset in memory, which was previously loaded from a pandas DataFrame. See the force argument for more information.
Because the NumPy array does not have column names, v1, v2, … are used as the variable names in Stata. But you can rename the variables with Stata’s rename command if you like.
The following two arguments are used to push the current dataset in memory to Python. This is helpful when you want to use Stata’s data management capabilities or you need Stata to generate some intermediate results, such as postestimation results.
doutd DATAFRAME¶
This argument pushes Stata’s current dataset in memory to Python as a pandas DataFrame once the cell has finished executing. Stata’s variable names will be used as the column names in the DataFrame.
douta ARRAY
This argument pushes Stata’s current dataset in memory to Python as a NumPy array once the cell has finished executing.
[18]:
%%stata doutd df2
rename v* newv*
In the above cell, we first rename all the variables in memory, prefixing them with new, and then we export the dataset to Python as a pandas DataFrame named df2. Here are the first five rows of this frame.
[19]:
df2.head()
[19]:
newv1  newv2  newv3  newv4  newv5  

0  0.294665  0.530587  0.191521  0.067900  0.786985 
1  0.656334  0.637521  0.575603  0.039063  0.357814 
2  0.945683  0.060045  0.864042  0.877291  0.051194 
3  0.652419  0.551751  0.597513  0.483529  0.282988 
4  0.297726  0.561509  0.396047  0.788701  0.418484 
The following arguments are used to load Python’s data into Stata frames, and vice versa.
f DFLISTARRLIST
This argument pushes one or multiple pandas DataFrames or NumPy arrays to Stata as frames. In Stata, the frames are created using the names of the DataFrames or arrays. The variable names and types follow the same rules as above. To push multiple DataFrames or arrays, specify the list separated by commas. There should be no whitespace between the names unless you enclose the list within single quotes (’) or double quotes (“).
foutd FRAMELIST
This argument pushes Stata’s frames in memory to Python as pandas DataFrames when the cell finishes executing. One or multiple frames can be specified. To push multiple frames, specify the list separated by commas. There should be no whitespace between the names unless you enclose the list within single quotes (’) or double quotes (“).
fouta FRAMELIST
This argument pushes Stata’s frames in memory to Python as NumPy arrays when the cell finishes executing. One or multiple frames can be specified. To push multiple frames, specify the list separated by commas. There should be no whitespace between the names unless you enclose the list within single quotes (’) or double quotes (“).
In the following cell, we load the two pandas DataFrames created above, df and df2, into Stata as frames with the same names.
[20]:
%%stata f df,df2
frame change df
list
frame change df2
list in 1/5
. frame change df
. list
++
 Name Age 

1.  James 25 
2.  Jack 31 
3.  Tom 25 
4.  George 37 
++
. frame change df2
. list in 1/5
++
 newv1 newv2 newv3 newv4 newv5 

1.  .294665 .53058676 .19152079 .06790036 .78698546 
2.  .65633352 .6375209 .57560289 .03906292 .3578136 
3.  .94568319 .06004468 .8640421 .87729053 .05119367 
4.  .65241862 .55175137 .59751325 .48352862 .28298816 
5.  .29772572 .56150891 .39604744 .78870071 .41848439 
++
.
force¶
You cannot load a Python dataset into Stata if Stata’s dataset in memory has been changed since it was last saved. To force loading of the dataset, replacing the dataset in memory, use the force argument with d.
This argument can also be used with the f argument when trying to load a pandas DataFrame or a NumPy array as a Stata frame, if the frame already exists in Stata. This argument will force loading into the frame, replacing the existing frame.
%stata line magic¶
The %stata line magic provides users a quick way to execute a singleline Stata command. Unlike with the cell magic, with the %stata line magic, the Stata command is specified immediately following the magic, on the same line. Arguments are not allowed with the line magic; if you want to specify an argument, use the cell magic instead.
[21]:
%stata codebook newv1

newv1 (unlabeled)

Type: Numeric (double)
Range: [.00005633,.99870374] Units: 1.000e12
Unique values: 1,000 Missing .: 0/1,000
Mean: .510526
Std. dev.: .286903
Percentiles: 10% 25% 50% 75% 90%
.101457 .272727 .515693 .755822 .893824
[22]:
%%stata
codebook newv1

newv1 (unlabeled)

Type: Numeric (double)
Range: [.00005633,.99870374] Units: 1.000e12
Unique values: 1,000 Missing .: 0/1,000
Mean: .510526
Std. dev.: .286903
Percentiles: 10% 25% 50% 75% 90%
.101457 .272727 .515693 .755822 .893824