Stata and H2O integration

In this documentation, we discuss how to integrate H2O into Stata. H2O is a scalable and distributed open-source machine learning and predictive analytics platform. You can read more about H2O at http://docs.h2o.ai/.

We have been experimenting with connecting to H2O from official Stata. Typically, we keep such experiments in-house until either we fully flesh them out into something we release to users or we shelve them because we decide they do not work out the way we wanted or our priorities change.

We think H2O is an interesting platform, and we want both our users and ourselves to be able to explore connecting to it from Stata. So we are giving our users early access to our work. We welcome feedback. We expect to release some community-contributed packages in addition to the connection we have enabled from official Stata, and we hope users will do the same.

The main command used to interact with H2O is _h2oframe. Notice the underscore; this signifies that this command is intended more for programmatic use. For the most part, it doesn’t return output or helpful error messages, and its syntax is intended more for programmers than end users. It can be used as an engine for wrappers that provide user-friendly output, error messages, and the like. What _h2oframe does provide is access to H2O along with returned results based on the actions that it performs.

Syntax and features are subject to change. When _h2oframe provides access to a given feature of H2O, keep in mind that that is an H2O feature. Though you are accessing the feature via a Stata command, what it does is up to H2O and is outside of Stata.