How do I connect to a database by using a Stata plugin?
| Title |
|
Using a plugin to connect to a database |
| Author |
Kevin Crow, StataCorp |
| Date |
January 2007 |
ODBC vs. plugin
The easiest way to import data from a database directly into Stata is to use
the odbc
command. However, there are occasions where the odbc command
will not work or is not the best solution for importing data. For example,
- the odbc command will not work on your
operating system (Solaris),
- there is not an ODBC driver for the database in question, or
- ODBC is too slow.
If you encounter any of the above problems, you can use a Stata plugin to
import and export your data directly to your database if your database has
an application programming interface (API). Most database applications have
an API, so the only real question is how to connect Stata to the database by
using the API.
This FAQ assumes that you have read and understood the FAQ on Stata
plugins at the following URL:
http://www.stata.com/plugins/
The example will use ANSI C as the plugin langauge and gcc as the
compiler. Some of the gcc compiler flags may be different from this
example depending on your operating system. You can also use C++ or a
different compiler. These issues are covered in the plugin FAQ.
Here I will connect Stata to a MySQL database. Even though this FAQ
explains how to connect to a MySQL database, you should be able to extend
the example to connect to other databases (e.g., Oracle, PostgreSQL,
Sybase).
Create a test database
First, you need to create a test database in MySQL. If you have the
privileges to create databases in MySQL, you can create a database with the
MySQL command CREATE DATABASE stata_test. If you don’t have
the privileges, you should contact your network administrator to create the
database. After creating the database, you need to create a user that has
the appropriate permissions to connect, query, etc., to the database
stata_test (again, you might need your network administrator for this
step). Depending on your version of MySQL, you should be able to give the
correct permission to access the database by using the following MySQL
command:
GRANT ALL PRIVILEGES ON stata_test.* TO 'user'@'localhost' IDENTIFIED BY
'password';
Note: Change user, localhost,
and password in the above command.
Now that a user has been created with the correct permission, you can
populate the database with the following SQL commands:
DROP TABLE IF EXISTS stata_test ;
CREATE TABLE stata_test
(
id INT NOT NULL,
var1 INT NOT NULL,
var2 INT NOT NULL,
var3 VARCHAR(1) NOT NULL,
PRIMARY KEY (id)
) TYPE = INNODB;
INSERT INTO stata_test (id,var1,var2,var3) VALUES(1,2,3,'a'), (2,5,6,'b') ;
An easy way to run these commands is to put them into a text file called
stata_test.sql and run the command
mysql -u user -p stata_test < stata_test.sql
from your terminal. After typing your password to run the
stata_test.sql file, you should have a database table in MySQL
that looks like
mysql> select * from stata_test;
+----+------+------+------+
| id | var1 | var2 | var3 |
+----+------+------+------+
| 1 | 2 | 3 | a |
| 2 | 5 | 6 | b |
+----+------+------+------+
2 rows in set (0.00 sec)
Plugin code
With the database set up, you can now create your C Stata plugin. For this
example, I am going to call my plugin connect_to_mysql, and it will
have the following Stata syntax:
connect_to_mysql [varlist] [if] [in], [create|load|write]
|
Options:
|
|
create |
creates the Stata dataset into which the data
are to be loaded. You must use the create option before load.
|
|
load |
imports the data into an empty Stata dataset.
|
|
write |
exports data in Stata to an existing database table.
|
connect_to_mysql can load a table from a database, and it can write
to an existing table in a database. The C code for the
connect_to_mysql command is
connect_to_mysql.c (you
will have to change user and password in
connect_to_mysql.c to connect to your database).
The plugin consists of three functions: get_results(),
load_data(), and write_data(). The function
get_results() connects to the database, queries the database, and
returns the results. get_results() also creates three Stata macros:
one that contains the names of the variables, one that contains the data
types of the variables, and one that contains the total number of
observations. These macros must be created because you cannot create
observations and variables in Stata from a plugin. You must pass the macros
back to Stata so that you can create the variables and set the number of
observations. load_data() loops over the MySQL results set and
stores the data to Stata. write_data() loops over Stata observations
and variables and writes data back to the database.
Compile plugin
To create C plugins, you will need to download two extra files: the Stata C
source file, stplugin.c, and Stata
C header file, stplugin.h.
You will also need to locate the path to the MySQL header file and
mysqlclient libraries. Usually, these files are in the directory
where you installed MySQL. For example, on a Linux machine, if I installed
MySQL in /usr/local/mysql, then the header files would be in
/usr/local/mysql/include/mysql/, and the library files would be in
/usr/local/mysql/lib/mysql/. If you are not sure where the MySQL header
file or libraries are, you can use the find command to locate them.
find / -mount -name "mysql.h"
find / -mount -name "libmysqlclient.so"
Once you know the paths to the MySQL header file and libraries, and you have
downloaded the Stata source file and header file, you can compile the plugin
on a 32-bit Linux machine by using gcc with the command:
$ gcc -shared -DSYSTEM=OPUNIX stplugin.c connect_to_mysql.c -o connect_to_mysql.plugin -I /usr/local/mysql/include/mysql/ -L /usr/local/mysql/lib/mysql/ -l mysqlclient
Call plugin from Stata
Now that the plugin has been compiled you can call it from Stata to load
data from MySQL. The first step in Stata is to create an empty dataset to
load the data by using the command plugin call connect_to_mysql *,
create.
clear
program connect_to_mysql, plugin
plugin call connect_to_mysql, create
// Create dataset using macros passed back from plugin
local stop : word count `vars'
set obs `obs'
forvalues i = 1/`stop' {
local var : word `i' of `vars'
local type : word `i' of `types'
if strpos("`type'", "str") > 0 {
qui gen `type' `var' = ""
}
else {
qui gen `type' `var' = .
}
}
You now should have an empty dataset to load the data by using the command
. plugin call connect_to_mysql *, load
The dataset created in Stata is
. qui compress
. describe
Contains data
obs: 2
vars: 4
size: 16 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id byte %10.0g
var1 byte %10.0g
var2 byte %10.0g
var3 str1 %9s
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
. list
+-------------------------+
| id var1 var2 var3 |
|-------------------------|
1. | 1 2 3 a |
2. | 2 5 6 b |
+-------------------------+
Let us say that you added a few observations to your dataset in Stata
using the commands:
. set obs 4
. replace id = _n in 3/4
. replace var1 = 3 in 3/4
. replace var2 = 4 in 3/4
. replace var3 = "a" in 3/4
If you wanted to add the new observations to the database, you would first
create a macro of the Stata variables’ names and types,
. unab myvars: _all
. foreach var of local myvars {
. local type: type `var'
. local mytypes " `mytypes' `type'"
}
and then call the plugin with the correct observation range,
. plugin call connect_to_mysql * in 3/4, write
The stata_test table in MySQL should now be
mysql> select * from stata_test;
+----+------+------+------+
| id | var1 | var2 | var3 |
+----+------+------+------+
| 1 | 2 | 3 | a |
| 2 | 5 | 6 | b |
| 3 | 3 | 4 | a |
| 4 | 3 | 4 | a |
+----+------+------+------+
4 rows in set (0.04 sec)
Conclusion
Using plugins to import and export data from databases into Stata allows you
to be as specific or as general as you like with the types of database
applications and databases to which you can connect. For example, if you wanted a
plugin that connected to several databases in MySQL, you could alter
connect_to_mysql to have the syntax
connect_to_mysql [varlist] [if] [in] [, [create|load|write]
database(database_name)
table(table_name)
user(username)
pass(password)]
This syntax is a little harder to parse and error check in C, but it allows
you more flexibility to connect to multiple databases in MySQL.
|