Home  /  Stata News  /  Vol 41 No 2  /  Community corner: AI tools for Stata
The Stata News

«Back to main page

Community corner: AI tools for Stata

The Stata community has a long history of extending the software’s capabilities through community-contributed tools. Today, that tradition is entering a new chapter as researchers and developers build innovative bridges between Stata and large language models (LLMs). This AI–Stata ecosystem is making coding faster, debugging easier, and complex analysis more accessible.

Benchmarks

Let’s begin with what’s already out there. Benchmarks test LLMs on Stata tasks to see which provides the most accurate Stata code responses.

  • StataBench: Developed by David Bann
    • It tests models across 27 knowledge areas, including programming, data management, statistical analysis, and graphics.
    • It tests local models, including any Ollama-compatible model, as well as Cloud-based models from OpenAI, Anthropic, and DeepSeek.
    • Access: GitHub (dbann/statabench).
    • Here are the current top performers:
  • LLM Stata Benchmark: Developed by Khaled Eltokhy
    • It tests 250 tasks, including programming, data management, and regression.
    • It tests Cloud-based models, including some that are locally capable.
    • Access: khaledeltokhy.com/benchmarks.
    • Here are the current top performers:

PyStata

Stata’s Python integration and the Stata Function Interface (sfi) are the engines behind many advanced AI implementations. LangChain is used to orchestrate multistep LLM workflows, while Instructor ensures that outputs conform to structured formats (such as JSON) that tools like Stata can parse. For researchers working with sensitive data, tools like Ollama and Hugging Face's Transformers library enable deployment of private, local LLMs via Python backends, bypassing the need for Cloud-based APIs.

  • Community-contributed command chatgpt was developed by Chuntao Li and Xueren Zhang (China Stata Club) using these tools. This module allows users to interact with ChatGPT.
    • Access: ssc install chatgpt.
  • See Chuck Huber’s blog for an example on how to use PyStata to call various LLMs.

Specialized AI assistants

The community has built "bots" specifically trained on Stata’s unique syntax and econometric theory.

  • The Econometrics Expert GPT: Developed by Solon Moreira, this custom GPT helps users review code and select appropriate econometric techniques.
  • Estima: Developed by Josh Zweig, this Stata GPT offers a streamlined web interface for troubleshooting Stata code.

Plugin architecture

Beyond simple API calls, researchers are using AI to optimize Stata’s backend performance.

Join the conversation

These community-led projects ensure that Stata remains a cutting-edge tool. Have you developed an AI workflow? Join the discussion on Statalist, or share your projects on GitHub.

«Back to main page