Community corner: AI tools for Stata
The Stata community has a long history of extending the software’s capabilities through community-contributed tools. Today, that tradition is entering a new chapter as researchers and developers build innovative bridges between Stata and large language models (LLMs). This AI–Stata ecosystem is making coding faster, debugging easier, and complex analysis more accessible.
Benchmarks
Let’s begin with what’s already out there. Benchmarks test LLMs on Stata tasks to see which provides the most accurate Stata code responses.
- StataBench: Developed by David Bann
- It tests models across 27 knowledge areas, including programming, data management, statistical analysis, and graphics.
- It tests local models, including any Ollama-compatible model, as well as Cloud-based models from OpenAI, Anthropic, and DeepSeek.
- Access: GitHub (dbann/statabench).
- Here are the current top performers:
- LLM Stata Benchmark: Developed by Khaled Eltokhy
- It tests 250 tasks, including programming, data management, and regression.
- It tests Cloud-based models, including some that are locally capable.
- Access: khaledeltokhy.com/benchmarks.
- Here are the current top performers:
PyStata
Stata’s Python integration and the Stata Function Interface (sfi) are the engines behind many advanced AI implementations. LangChain is used to orchestrate multistep LLM workflows, while Instructor ensures that outputs conform to structured formats (such as JSON) that tools like Stata can parse. For researchers working with sensitive data, tools like Ollama and Hugging Face's Transformers library enable deployment of private, local LLMs via Python backends, bypassing the need for Cloud-based APIs.
- Community-contributed command chatgpt was developed by Chuntao Li and Xueren Zhang (China Stata Club) using these tools. This module allows users to interact with ChatGPT.
- Access: ssc install chatgpt.
- See Chuck Huber’s blog for an example on how to use PyStata to call various LLMs.
Specialized AI assistants
The community has built "bots" specifically trained on Stata’s unique syntax and econometric theory.
- The Econometrics Expert GPT: Developed by Solon Moreira, this custom GPT helps users review code and select appropriate econometric techniques.
- Estima: Developed by Josh Zweig, this Stata GPT offers a streamlined web interface for troubleshooting Stata code.
- Access: statagpt.com (subscription required).
Plugin architecture
Beyond simple API calls, researchers are using AI to optimize Stata’s backend performance.
- At the 2026 Portugal Stata Conference, Miguel Portela demonstrated how AI can help redesign commands using Stata’s plugin architecture (C/C++), yielding massive performance gains for complex estimators.
- Download materials: Stata UGM Lisbon (Feb 2026).
- Claude and MCP integration: Song Tan’s Stata-MCP project uses the Model Context Protocol to let AI agents run regressions and analyze data autonomously.
- Access: statamcp.com.
- IDE integration: Thomas Monk developed the Stata Workbench extension for VS Code and the mcp-stata server that powers it.
- YouTube tutorials: These tutorials walk you through developing your own integrations.
- Mikko Rönkkö: Gen AI in Stata with Visual Studio Code.
- Aniket Panjwani: How to Use Claude Code with Stata.
Join the conversation
These community-led projects ensure that Stata remains a cutting-edge tool. Have you developed an AI workflow? Join the discussion on Statalist, or share your projects on GitHub.