Home  /  Products  /  Features  /  H2O integration

<-  See Stata's other features

Highlights

  • Start a new H2O cluster, or connect to an existing H2O cluster from Stata

  • Manipulate data (H2O frames) on the H2O cluster from Stata

    • Create new H2O frames

    • Import or upload data files to new H2O frames

    • Put Stata's current dataset into a new H2O frame, or load H2O frames into Stata

    • Split, combine, and query H2O frames

  • Access the capabilites of H2O using various utility commands directly from Stata

In Stata, we have been experimenting with connecting to H2O. H2O is a scalable and distributed open-source machine learning and predictive analytics platform. You can read more about H2O at docs.h2o.ai.

With the integration of H2O, you can start a new H2O cluster from Stata on your local machine through the command

. h2o init

or connect to a local or remote H2O cluster through

. h2o connect [, ip(#,#,#,#) port(#)]

You can access H2O's web UI, Flow, with

. h2o flow

Stata provides other utility commands to interact with the cluster; see Start, connect, and query an H2O cluster for details.

Once the cluster is started or connected, you can manipulate data (H2O frames) on the cluster using a suite of _h2oframe commands. For example, you can create new H2O frames; import or upload data files to new H2O frames; put Stata's current dataset into a new H2O frame; load H2O frames into Stata and save them locally; or split, combine, and query H2O frames from within Stata. You can also combine the capabilities of those _h2oframe commands with Stata's vast data management commands for more data wrangling tools. See Work with H2O frames for a complete list of commands.

Although this is still in the experimental stage for us, we want to make it available to our users to try out. On the other hand, because it is an experimental feature, syntax and features are subject to change. When using Stata commands that provide access to a given feature of H2O, keep in mind that it is an H2O feature. It may have a Stata command accessing it, but what it does is up to H2O and is outside of Stata.

Reference

H20.ai. (2021) H2O: Scalable Machine Learning Platform. Version 3.36.0.1. https://github.com/h2oai/h2o-3

Tell me more

See [P] H2O Intro.

See Stata and H2O integration.