Import and export an H2O frame

Syntax

Import files into an H2O frame on the H2O cluster

    _h2oframe import impath, into(newframename)
            [h2oframe_import_options]

Upload local files into an H2O frame on the H2O cluster

    _h2oframe upload uppath, into(newframename)
            [h2oframe_upload_options]

Export current H2O frame to a delimited text file

    _h2oframe export [using] filename [if] [in]
            [, h2oframe_export_options]

Export subset of current H2O frame to a delimited text file

    _h2oframe export [columnlist] using filename [if] [in]
            [, h2oframe_export_options]

impath is the complete URL or normalized file path of the file(s) to be imported. impath can be the location of the file to be imported or the path to a directory with multiple files (of same format) to be imported. If impath contains embedded spaces, enclose it in double quotes.

uppath is the normalized file path of the file to be uploaded; it is the location of the local file to be uploaded. If uppath contains embedded spaces, enclose it in double quotes.

columnlist is a list of column names in the current H2O frame; see Specifying a list of columns for more information.

filename is the destination .csv file.

 h2oframe_import_options                   Description
 -----------------------------------------------------------------------------------
 * into(newframename)                      destination H2O frame
   header(#)                               treat first line of data as data or column
                                             headers
   delimiter("char")                       use char as delimiter
   skipcols(numlist)                       skip the specified columns
   nastring(string)                        interpret the specified strings as
                                             missing values
   pattern(string)                         import file(s) that matches the regular
                                             expression; applies only if impath is a
                                             folder
 -----------------------------------------------------------------------------------
 * into() is required.
 
 h2oframe_upload_options                   Description
 -----------------------------------------------------------------------------------
 * into(newframename)                      destination H2O frame
   header(#)                               treat first line of data as data or column
                                             headers
   delimiter("char")                       use char as delimiter
   skipcols(numlist)                       skip the specified columns
   nastring(string)                        interpret the specified strings as
                                             missing values
 -----------------------------------------------------------------------------------
 * into() is required.
 
 h2oframe_export_options                   Description
 -----------------------------------------------------------------------------------
   replace                                 overwrite existing filename
 -----------------------------------------------------------------------------------

Description

_h2oframe import loads files to an H2O cluster as an H2O frame. The data are loaded in parallel, using multi-threading, which makes it fast. The specified path can be a complete URL, a normalized path for the file(s), or a folder that contains the file(s). The path must be a valid cluster-side path for each node in the H2O cluster, which means the path must be accessible by each node within the cluster.

_h2oframe upload pushes a local file from disk to an H2O cluster as an H2O frame. In H2O jargon, it pushes data from the client to the cluster. The specified path must be a local path.

_h2oframe export exports an existing H2O frame to a .csv file on the local disk. Make sure you have enough disk space to accommodate the destination file because the H2O frame on the H2O cluster may be very large.

Options

Options for _h2oframe import

into(newframename) specifies the destination H2O frame into which the files are imported. into() is required.

header(#) specifies how to parse the first line of data. -1 means that the first line is parsed as data, and 1 means that the first line is parsed as column headers. 0 means to guess. The default is 0.

delimiter(“char”) allows you to specify a different separation character. For instance, if values in the file are separated by a semicolon, then you would specify delimiter(“;”). Specify delimiter(“\t”) to use a tab character, or specify delimiter(” “) to use whitespace as a delimiter. The default is delimiter(“,”).

skipcols(numlist) specifies the columns to be skipped (in other words, not imported). The columns are specified as indices starting from 1.

nastring(string) specifies a list of strings to be interpreted as missing values.

pattern(string) specifies a regular expression used to match one or more files if impath is a folder. For example, specifying *.csv will import all .csv files in the specified folder to the H2O frame.

Options for _h2oframe upload

into(newframename) specifies the destination H2O frame into which the files are uploaded. into() is required.

header(#) specifies how to parse the first line of data. -1 means that the first line is parsed as data, and 1 means that the first line is parsed as column headers. 0 means to guess. The default is 0.

delimiter(“char”) allows you to specify a different separation character. For instance, if values in the file are separated by a semicolon, then you would specify delimiter(“;”). Specify delimiter(“\t”) to use a tab character, or specify delimiter(” “) to use whitespace as a delimiter. The default is delimiter(“,”).

skipcols(numlist) specifies the columns to skip from upload. The columns are specified as indices starting from 1.

nastring(string) specifies a list of strings to be interpreted as missing values.

Options for _h2oframe export

replace specifies that filename be replaced if it already exists.

Examples

 Read a file into the H2O cluster as an H2O frame named auto
     . _h2oframe import https://www.stata.com/examples/auto.csv, into(auto)

 Look at what we just loaded
     . _h2oframe get auto
     . list

 -----------------------------------------------------------------------------------
 Setup
     . sysuse auto, clear
     . export delimited auto.csv

 Upload auto.csv into the H2O cluster as an H2O frame named auto2
     . _h2oframe upload auto.csv, into(auto2)

 Look at what we just loaded
     . _h2oframe get auto2
     . list

 -----------------------------------------------------------------------------------
 Setup
     . _h2oframe change auto

 Export the whole H2O frame to myauto.csv
     . _h2oframe export myauto.csv

 -----------------------------------------------------------------------------------
 Setup
     . _h2oframe change auto

 Same as above, but only export a subset of the data. We use the replace option
 because myauto.csv already exists.
     . _h2oframe export make mpg rep78 foreign in 1/10 using myauto.csv, replace

Stored results

 _h2oframe import and _h2oframe upload store the following in r():

 Scalars
   r(N)                number of rows in the H2O frame
   r(k)                number of columns in the H2O frame