Home  /  Products  /  Stata 19  /  Import data from Parquet files

← See Stata 19's new features

Highlights

Apache Parquet is a data file format that organizes data by columns, and it supports several compression methods for the data to achieve efficient storage. Now you may use import parquet to import data from a Parquet file into Stata. import parquet reads a Parquet file into an Apache Arrow table and then converts the table to a Stata dataset. Most Parquet data types and compression methods are supported. This feature is a part of StataNow™.

Let's see it work

We first look at the information contained in the Parquet file iris.parquet.

. import parquet using iris.parquet, describe
Contains data from C:/StataNow19/iris.parquet

 Observations:        150
    Variables:        5

Column Type
sepal.length double sepal.width double petal.length double petal.width double variety utf8

Now we import the Parquet data into Stata.

. import parquet using iris.parquet
(5 vars, 150 obs)

We can also import a subset of columns. For example, below we want to import only columns sepal.length, sepal.width, and variety into Stata.

. import parquet sepal.length sepal.width variety using iris.parquet, clear
(3 vars, 150 obs)

Additionally, we can import a subset of rows from the Parquet file; below, we only import the last 100 rows:

. import parquet sepal.length sepal.width variety using iris.parquet, rowrange(-100:L) clear
(3 vars, 100 obs)

Tell me more

Read more about the import parquet command in [D] import parquet in the Stata Data Management Reference Manual.

Learn more about Stata's data management features.

View all the new features in Stata 19 and, in particular, new in data management.

Ready to get started?

Experience powerful statistical tools, reproducible workflows, and a seamless user experience—all in one trusted platform.