In Stata 17, we have been experimenting with connecting to H2O. H2O is a scalable and distributed open-source machine learning and predictive analytics platform. You can read more about H2O at docs.h2o.ai.
With the integration of H2O, you can start a new H2O cluster from Stata on your local machine through the command
. h2o init
or connect to a local or remote H2O cluster through
. h2o connect [, ip(#,#,#,#) port(#)]
You can access H2O's web UI, Flow, with
. h2o flow
Stata provides other utility commands to interact with the cluster; see Start, connect, and query an H2O cluster for details.
Once the cluster is started or connected, you can manipulate data (H2O frames) on the cluster using a suite of _h2oframe commands. For example, you can create new H2O frames; import or upload data files to new H2O frames; put Stata's current dataset into a new H2O frame; load H2O frames into Stata and save them locally; or split, combine, and query H2O frames from within Stata. You can also combine the capabilities of those _h2oframe commands with Stata's vast data management commands for more data wrangling tools. See Work with H2O frames for a complete list of commands.
Although this is still in the experimental stage for us, we want to make it available to our users to try out. On the other hand, because it is an experimental feature, syntax and features are subject to change. When using Stata commands that provide access to a given feature of H2O, keep in mind that it is an H2O feature. It may have a Stata command accessing it, but what it does is up to H2O and is outside of Stata.
H20.ai. (2021) H2O: Scalable Machine Learning Platform. Version 22.214.171.124.
See [P] H2O Intro.
See Stata and H2O integration.