[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: selecting obs while reading in huge data set
On 2004-08-19, at 10.15, Steve Stillman wrote:
Steve and Sascha
I have a recommendation that I wouldn't usually make. I have been
recently doing work with matched employer-employee data with over 30
million obs, so we have been running into the same problem as you. SAS
is much better for large dataset merges than Stata. In particular,
SQL is remarkably fast at doing these types of merges (likely because
SQL is written with this type of operation in mind).
Well there is was, likely the last time I will recommend SAS over
There is a solution that does not include SAS. Stata also support SQL
databases. If you set up an ODBC connection to an SQL database then the
-odbc load- command will allow you to load datasets directly from the
SQL server. It is even possible to execute a SQL statement directly
from Stata with -odbc exec("SqlStmt")- or -odbc sqlfile("filename")-.
This way you could merge and load only the observations you are
interested in directly from Stata in one command.
Before you can do this you need to download your data to an SQL server.
If you don't have access to one, you could download one for FREE here:
http://dev.mysql.com/downloads/mysql/4.0.html. After that you need to
set up and ODBS driver manager, and an ODBC driver and download your
This is a bit of work but if you plan to do it a lot, it should be
I tried it a couple of months ago and it worked very well with Stata,
however, labels are lost and there is only one code for missing values.
Also, saving and storing data is a bit slower than from disc files.
If you have a Mac I would recommend the Complete MySQL package that has
an easy set up and comes with drivers and additional software:
* For searches and help try: