st: A new professional application using Stata

Dear fellow Stata users,

Revisor, an audit case selection system for revenue agencies, is now out of beta ( The system assigns risk scores to each taxpayer (including those never audited in the past) in order to produce optimal audit plans. 

Revisor implements a two-step penalized Heckman procedure with cross validation to correct selection biases and pock the best co-variables (both in the selection equation and in the main, "compliance" equation). Among other statistical procedures, it also "groups" (bins) categorical and continuous variables to produce a scorecard and scores in the standard format. 

What is possibly of interest to some of you in the Stata community is that the specialized algorithms used in Revisor were coded in Mata, and that other more mundane tasks such as producing histograms and ROCs, are handled in standard Stata code. In Revisor, Stata is fully interfaced with a Java server, PostgreSQL and a Flex client GUI. 

So Revisor is another example of a professional, self-contained application based on Stata (Adept by the World Bank also comes to mind). In the process of building Revisor, Stata proved to be a versatile and robust platform, and Mata a high-performance, flexible language for coding the trickiest of statistical algorithms. Whether with standard Stata (for the SaaS version) of with NBS (for the locally deployed version), we had no problem integrating Stata with the rest of the application. 

Also, to my mind, Revisor shows the potential for Stata to venture into data mining/machine learning territory. Being econometricians ourselves, we have the highest respect for the tradition that Stata has built on. But "Big Data" and the rest of it are here, and the culture clash between on the one hand econometricians/statisticians and on the other hand "data scientists" is in fact very stimulating. It would be a pity for Stata not to seize the opportunity: Stata and especially Mata are perfect for writing decision trees (no, we don't need a graphic interface to prune a tree if you do it right, the Stata way: with cross-validation), neural networks, and most important of all, penalized estimators for efficient model selection. 

Charles Vellutini

