Put Stata variables into an H2O frame and vice versa

Syntax

Save data in memory to an H2O frame on the H2O cluster

    _h2oframe put [varlist] [if] [in] , into(newframename) [put_options]

Load an existing H2O frame as the current Stata dataset

    _h2oframe get [using] framename [if] [in] [, get_options]

Load a subset of columns in an existing H2O frame as the current Stata dataset

    _h2oframe get columnlist using framename [if] [in] [, get_options]

varlist is a list of variable names in Stata’s current dataset.

columnlist is a list of column names in the H2O frame; see Specifying a list of columns for more information.

 put_options                               Description
 -----------------------------------------------------------------------------------
 * into(newframename)                      destination H2O frame
   nolabel                                 output numeric values (not labels) of
                                             labeled variables
   coltype(varlist, type)                  load variables into the H2O frame using
                                             the specified column type
   current                                 make the H2O frame the current (working)
                                             H2O frame
   replace                                 replace the H2O frame if it already
                                             exists
 -----------------------------------------------------------------------------------
 * into() is required.
 
 get_options                               Description
 -----------------------------------------------------------------------------------
 case(preserve|lower|upper)                preserve the case or read column names as
                                             lowercase (the default) or uppercase
 asfloat                                   load all floating-point data as floats
 asdouble                                  load all floating-point data as doubles
 clear                                     replace data in memory
 -----------------------------------------------------------------------------------

Description

_h2oframe put exports Stata’s current dataset to an H2O frame on the H2O cluster. Categorical variables that have a value label attached to them will be stored as type enum in the H2O frame, and categorical variables that do not have a value label attached to them will be stored as type int. Read What is an H2O frame? for more information about the data types in an H2O frame.

_h2oframe get loads an existing H2O frame to Stata as the current dataset. All enum (categorical) columns are stored as string variables in the dataset.

Options

Options for _h2oframe put

into(newframename) specifies the destination H2O frame in which to store the Stata variables. into() is required.

nolabel specifies that the numeric values of labeled variables be exported to the H2O frame rather than the label associated with each value.

coltype(varlist, type) sets the column type for the specified variable(s) in the destination H2O frame. This option may be specified multiple times in a single command to set the column types for different variables. varlist specifies the variables and type specifies the column type of the variables within the H2O frame. type may be one of numeric, string, time, enum, and uuid. Columns that are set to be type numeric will be assigned type int or real by H2O, depending on the content. See What is an H2O frame for more information about column types of an H2O frame.

current sets the H2O frame as the current (working) H2O frame. This is the same as typing _h2oframe change newframename after the frame is created.

replace specifies that if an H2O frame with the same name as newframename already exists, its content will be replaced by the new H2O frame.

Options for _h2oframe get

case(preserve|lower|upper) specifies the case of the column names after loading. The default is case(lower).

asfloat loads numeric data from the H2O frame as type float. The default storage type of the columns is determined by set type.

asdouble loads numeric data from the H2O frame as type double. The default storage type of the columns is determined by set type.

clear specifies to replace the data in memory, even though the current data have not been saved to disk.

Examples

 Setup
     . sysuse auto

 Export this dataset to an H2O frame named auto1
     . _h2oframe put, into(auto1)

 Look at what we just loaded
     . _h2oframe change auto1
     . _h2oframe describe

 Read a subset of the data into another H2O frame named auto2 and then list
 the contents of the frame
     . _h2oframe put make mpg foreign in 1/50, into(auto2)
     . _h2oframe change auto2
     . _h2oframe list

 -----------------------------------------------------------------------------------
 Load the data from the H2O frame auto1 into Stata as the current dataset and
 then list the data
     . _h2oframe get auto1, clear
     . list

 -----------------------------------------------------------------------------------
 Same as above, but only load a subset of the data
     . _h2oframe get make mpg rep78 foreign using auto1 in 1/10, clear
     . list

Stored results

 _h2oframe get stores the following in r():

 Scalars
   r(N)                number of rows loaded from the H2O frame
   r(k)                number of columns loaded from the H2O frame