Home  /  Resources & support  /  FAQs  /  Using a plugin to connect to a database

How do I connect to a database by using a Stata plugin?

Title   Using a plugin to connect to a database
Author Kevin Crow, StataCorp

ODBC vs. plugin

The easiest way to import data from a database directly into Stata is to use the odbc command. However, there are occasions where the odbc command will not work or is not the best solution for importing data. For example,

  • there is not an ODBC driver for the database in question, or
  • ODBC is too slow.

If you encounter any of the above problems, you can use a Stata plugin to import and export your data directly to your database if your database has an application programming interface (API). Most database applications have an API, so the only real question is how to connect Stata to the database by using the API.

This FAQ assumes that you have read and understood the FAQ on Stata plugins at the following URL:

http://www.stata.com/plugins/

The example will use ANSI C as the plugin langauge and gcc as the compiler. Some of the gcc compiler flags may be different from this example depending on your operating system. You can also use C++ or a different compiler. These issues are covered in the plugin FAQ.

Here I will connect Stata to a MySQL database. Even though this FAQ explains how to connect to a MySQL database, you should be able to extend the example to connect to other databases (e.g., Oracle, PostgreSQL, Sybase).


Create a test database

First, you need to create a test database in MySQL. If you have the privileges to create databases in MySQL, you can create a database with the MySQL command CREATE DATABASE stata_test. If you don’t have the privileges, you should contact your network administrator to create the database. After creating the database, you need to create a user that has the appropriate permissions to connect, query, etc., to the database stata_test (again, you might need your network administrator for this step). Depending on your version of MySQL, you should be able to give the correct permission to access the database by using the following MySQL command:

GRANT ALL PRIVILEGES ON stata_test.* TO 'user'@'localhost' IDENTIFIED BY 'password';

Note: Change user, localhost, and password in the above command.

Now that a user has been created with the correct permission, you can populate the database with the following SQL commands:

  DROP TABLE IF EXISTS stata_test ; 
  CREATE TABLE stata_test		
  (
          id              INT NOT NULL,
          var1            INT NOT NULL,
          var2            INT NOT NULL,
          var3            VARCHAR(1) NOT NULL,
	  PRIMARY KEY (id)
  ) TYPE = INNODB;

  INSERT INTO stata_test (id,var1,var2,var3) VALUES(1,2,3,'a'), (2,5,6,'b') ;

An easy way to run these commands is to put them into a text file called stata_test.sql and run the command

        mysql -u user -p stata_test < stata_test.sql

from your terminal. After typing your password to run the stata_test.sql file, you should have a database table in MySQL that looks like

  mysql> select * from stata_test;
 +----+------+------+------+
 | id | var1 | var2 | var3 |
 +----+------+------+------+
 |  1 |    2 |    3 | a    |
 |  2 |    5 |    6 | b    |
 +----+------+------+------+
 2 rows in set (0.00 sec)

Plugin code

With the database set up, you can now create your C Stata plugin. For this example, I am going to call my plugin connect_to_mysql, and it will have the following Stata syntax:

        connect_to_mysql [varlist] [if] [in], [create|load|write]
Options:
create creates the Stata dataset into which the data are to be loaded. You must use the create option before load.
load imports the data into an empty Stata dataset.
write exports data in Stata to an existing database table.

connect_to_mysql can load a table from a database, and it can write to an existing table in a database. The C code for the connect_to_mysql command is connect_to_mysql.c (you will have to change user and password in connect_to_mysql.c to connect to your database).

The plugin consists of three functions: get_results(), load_data(), and write_data(). The function get_results() connects to the database, queries the database, and returns the results. get_results() also creates three Stata macros: one that contains the names of the variables, one that contains the data types of the variables, and one that contains the total number of observations. These macros must be created because you cannot create observations and variables in Stata from a plugin. You must pass the macros back to Stata so that you can create the variables and set the number of observations.

load_data() loops over the MySQL results set and stores the data to Stata. write_data() loops over Stata observations and variables and writes data back to the database.


Compile plugin

To create C plugins, you will need to download two extra files: the Stata C source file, stplugin.c, and Stata C header file, stplugin.h.

You will also need to locate the path to the MySQL header file and mysqlclient libraries. Usually, these files are in the directory where you installed MySQL. For example, on a Linux machine, if I installed MySQL in /usr/local/mysql, then the header files would be in /usr/local/mysql/include/mysql/, and the library files would be in /usr/local/mysql/lib/mysql/. If you are not sure where the MySQL header file or libraries are, you can use the find command to locate them.

        find / -mount -name "mysql.h"
        find / -mount -name "libmysqlclient.so"

Once you know the paths to the MySQL header file and libraries, and you have downloaded the Stata source file and header file, you can compile the plugin on a 32-bit Linux machine by using gcc with the command:

$ gcc -shared -DSYSTEM=OPUNIX stplugin.c connect_to_mysql.c -o connect_to_mysql.plugin -I /usr/local/mysql/include/mysql/ -L /usr/local/mysql/lib/mysql/ -l mysqlclient

Call plugin from Stata

Now that the plugin has been compiled you can call it from Stata to load data from MySQL. The first step in Stata is to create an empty dataset to load the data by using the command plugin call connect_to_mysql *, create.

clear

program connect_to_mysql, plugin
plugin call connect_to_mysql, create

// Create dataset using macros passed back from plugin
local stop : word count `vars'
set obs `obs'

forvalues i = 1/`stop' {
	local var : word `i' of `vars'
	local type : word `i' of `types'
	if strpos("`type'", "str") > 0 {
		qui gen `type' `var' = ""
	}
	else {
		qui gen `type' `var' = .	
	}
}

You now should have an empty dataset to load the data by using the command

        . plugin call connect_to_mysql *, load

The dataset created in Stata is

 . qui compress

 . describe

 Contains data
   obs:             2                          
  vars:             4                          
  size:            16 (99.9% of memory free)
 -------------------------------------------------------------------------------
               storage  display     value
 variable name   type   format      label      variable label
 -------------------------------------------------------------------------------
 id              byte   %10.0g                 
 var1            byte   %10.0g                 
 var2            byte   %10.0g                 
 var3            str1   %9s                    
 -------------------------------------------------------------------------------
 Sorted by:  
      Note:  dataset has changed since last saved

 . list

      +-------------------------+
      | id   var1   var2   var3 |
      |-------------------------|
   1. |  1      2      3      a |
   2. |  2      5      6      b |
      +-------------------------+

Let us say that you added a few observations to your dataset in Stata using the commands:

 . set obs 4
 . replace id = _n in 3/4
 . replace var1 = 3 in 3/4
 . replace var2 = 4 in 3/4
 . replace var3 = "a" in 3/4

If you wanted to add the new observations to the database, you would first create a macro of the Stata variables’ names and types,

 . unab myvars: _all
 . foreach var of local myvars {
 . 	local type: type `var'
 . 	local mytypes " `mytypes' `type'"
 }	

and then call the plugin with the correct observation range,

        . plugin call connect_to_mysql * in 3/4, write

The stata_test table in MySQL should now be

 mysql> select * from stata_test;
 +----+------+------+------+
 | id | var1 | var2 | var3 |
 +----+------+------+------+
 |  1 |    2 |    3 | a    |
 |  2 |    5 |    6 | b    |
 |  3 |    3 |    4 | a    |
 |  4 |    3 |    4 | a    |
 +----+------+------+------+
 4 rows in set (0.04 sec)

Conclusion

Using plugins to import and export data from databases into Stata allows you to be as specific or as general as you like with the types of database applications and databases to which you can connect. For example, if you wanted a plugin that connected to several databases in MySQL, you could alter connect_to_mysql to have the syntax

connect_to_mysql [varlist] [if] [in] [, [create|load|write] database(database_name) table(table_name) user(username) pass(password)]

This syntax is a little harder to parse and error check in C, but it allows you more flexibility to connect to multiple databases in MySQL.