Dive deeper into H2O

What do you do when traditional statistical models fall short? Unlock the power of high-performance predictive models powered by H2O without leaving your familiar Stata environment.

The introduction of the h2oml suite earlier this year, featuring gradient boosting machines and random forest with tools for hyperparameter tuning, model performance, prediction, and prediction explainability, was just the start. We've been working to give you even more power and insight. Since then, we've released three blog posts detailing different ways you can take your machine learning analysis in Stata to the next level. We've also added three new postestimation commands in the most recent Stata 19 update, making it easier to interpret your models and diagnose potential issues.

Approximate statistical tests for comparing binary classifier error rates using H2OML

Uncertain whether your new model truly outperforms your baseline? The first blog post provides two methods for conducting approximate statistical tests to compare binary classifier error rates: the McNemar test (McNemar 1947) and the combined 5 × 2 cross-validated (5 × 2 cv) F test (Alpaydin 1998). These tests give you the statistical rigor needed to confidently claim your model performs better.

Prediction intervals with gradient boosting machine

The second blog post tackles a critical question: how confident should you be in individual predictions? We construct prediction intervals for a gradient boosting machine, moving beyond single-point predictions to capture the uncertainty around each forecast.

Heterogeneous treatment-effect estimation with S-, T-, and X-learners using H2OML

The third blog explores one of the most powerful applications of machine learning in causal inference: heterogeneous treatment effects. Does a treatment work equally well for everyone, or does it help some groups while harming others? We demonstrate how S-, T-, and X-learners can reveal subgroup-specific effects that traditional approaches might miss.

h2omlgraph permimp graphs permutation variable importance, clearly showing which predictors drive your results. Permutation importance graphs show how much prediction accuracy drops when each variable is shuffled, giving you interpretable insights.
h2omlgraph rvfplot graphs the residuals against the fitted values for comprehensive model diagnostics. In machine learning, this diagnostic plot is useful for evaluating how well the model fits the data.
h2omlgraph rvpplot graphs the residuals against a predictor, providing an additional perspective on model fit across values for a specified predictor.

Keep an eye out for more updates and resources as we continue to expand Stata’s h2oml toolkit.

References

McNemar, Q. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12: 153–157. https://doi.org/10.1007/BF02295996.

Alpaydin, E. 1999. Combined 5 x 2 cv F test for comparing supervised classification learning algorithms. Neural Computation 11: 1885–1892. http://doi.org/10.1162/089976699300016007.

«Back to main page

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.