Start, connect, and query an H2O cluster

Syntax

Start a new H2O cluster or connect to an existing H2O cluster

    h2o init [, init_options]

Connect to an existing H2O cluster

    h2o connect [, connect_options]

Close the connection to an existing H2O cluster

    h2o disconnect

Shut down the H2O cluster

    h2o shutdown [, force]

Query current H2O cluster information

    h2o query [, detail]

Open H2O Flow UI

    h2o flow

Enable/disable H2O job progress bar

    h2o set progress { on | off }

Set the time zone on the H2O cluster

    h2o set timezone tz

List all the acceptable time zones by the H2O cluster

    h2o list timezones [pattern]

pattern is one of the following: *, _all, *name*, *name, or name*. Specifying nothing, _all, or * lists all results. Specifying *name* lists all results containing name. Specifying *name lists all results ending with name. Specifying name* lists all results starting with name.

 init_options                              Description
 -----------------------------------------------------------------------------------
 ip(string)                                IP address where the cluster is running
 port(#)                                   port number the cluster listens to
 nthreads(#)                               number of threads to use
 -----------------------------------------------------------------------------------
 
 connect_options                           Description
 -----------------------------------------------------------------------------------
 url(string)                               full URL of the cluster to connect
 ip(string)                                IP address where the cluster is running
 port(#)                                   port number the cluster listens to
 -----------------------------------------------------------------------------------

Description

h2o provides utilities for accessing H2O from within Stata. H2O is a scalable and distributed open-source machine learning and predictive analytics platform. With these utilities, users can start or connect to an H2O cluster to access H2O’s capabilities. See H2O intro for more discussion about H2O clusters.

h2o init attempts to connect to a local or remote H2O cluster by default. If one is not found, it starts a new local H2O cluster and connects to it. The remote cluster is specified by an IP address and a port number.

h2o connect connects to an existing local or remote H2O cluster. The remote cluster is specified by an IP address and a port number, or by a URL address.

h2o disconnect closes the connection to the H2O cluster. The cluster is still up and running, and it can be reconnected using h2o connect. See Close and disconnect from the H2O cluster for more information.

h2o shutdown shuts down the H2O cluster from within Stata. The cluster is completely destroyed and all the resources within it are discarded. See Close and disconnect from the H2O cluster for more information.

h2o query lists the current H2O cluster information.

h2o flow opens H2O Flow UI in the browser.

h2o set progress sets whether to display the H2O execution progress. The execution progress is displayed as a percentage.

h2o set timezone sets the time zone on the H2O cluster.

h2o list timezone lists all acceptable time zones and their aliases or those that meet specified criteria by the H2O cluster.

Options

Options for h2o init

ip(string) specifies the IP address where the H2O cluster is running. The address is specified as a string of format #.#.#.#.

By default, h2o init will check whether there is an H2O cluster running at localhost:54321 with IP address 127.0.0.1. When ip() is specified, h2o init will check on this specified address. If there is a cluster, h2o init will try to connect to it. If the connection fails, h2o init will launch a local H2O cluster running at localhost:54321 and then connect to it.

port(#) specifies the port number the H2O cluster listens to. The default is 54321. It must be an integer between 1 and 65535.

nthreads(#) specifies the maximum number of parallel threads to use when launching the H2O cluster. This option is used only when Stata starts a local H2O cluster.

Options for h2o connect

url(string) specifies the full URL address of the H2O cluster to connect. There are two ways to connect to an existing H2O cluster: either by specifying a full URL address in the form of ip:port or through the IP address and port number. If none of those is specified, h2o connect will check whether there is an H2O cluster running at localhost:54321 with IP address 127.0.0.1. If there is a cluster, h2o connect will connect to it. Otherwise, an error is issued. url() may not be specified with ip() and port().

ip(string) specifies the IP address where the H2O cluster is running. The address is specified as a string of format #.#.#.#. ip() may not be specified with url().

If ip() is specified and there is an existing H2O cluster running on this address, h2o connect will try to connect to this specified cluster. If not successful, h2o connect will throw an error.

port(#) specifies the port number the H2O cluster listens to. The default is 54321. It must be an integer between 1 and 65535. port() may not be specified with url().

Options for h2o shutdown

force forces the H2O cluster to shut down from Stata.

h2o shutdown will fail by default and issue the warning “…Shutting it down will discard all resources within the cluster…”. This is because shutting down the H2O cluster will destroy the process that starts it. Specifying the force option will force the H2O cluster to shut down and will destroy everything within the cluster.

See Close and disconnect from the H2O cluster for more discussion.

Options for h2o query

detail specifies to display summary information of the nodes within the cluster in addition to displaying the H2O cluster information.

Examples

 Launch a local H2O cluster
     . h2o init

 Query the H2O cluster information
     . h2o query

 Same as above, but also display each node's information
     . h2o query, detail

 List all available time zones on the H2O cluster
     . h2o list timezones

 Same as above, but list all US time zones
     . h2o list timezones US*

 Close the connection to the H2O cluster and reconnect to it
     . h2o disconnect
     . h2o connect

 Shut down the H2O cluster
     . h2o shutdown, force

Stored results

 h2o query stores the following in r():

 Scalars
   r(nodes)            number of nodes connecting to the H2O cluster
   r(total_cores)      total cores on the H2O cluster
   r(allowed_cores)    number of cores allowed to use by the client

 Macros
   r(url)              H2O connection URL
   r(version)          H2O version
   r(datatimezone)     H2O cluster data parsing time zone
   r(timezone)         H2O cluster time zone
   r(freemem)          H2O cluster free memory