Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: selecting obs while reading in huge data set


From   Michael Ingre <Michael.Ingre@ipm.ki.se>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: selecting obs while reading in huge data set
Date   Thu, 19 Aug 2004 15:11:17 +0200

On 2004-08-19, at 10.15, Steve Stillman wrote:

Sascha,
I have a recommendation that I wouldn't usually make. I have been
recently doing work with matched employer-employee data with over 30
million obs, so we have been running into the same problem as you. SAS
is much better for large dataset merges than Stata. In particular, proc
SQL is remarkably fast at doing these types of merges (likely because
SQL is written with this type of operation in mind).

Well there is was, likely the last time I will recommend SAS over Stata.

Cheers,
Steve
Steve and Sascha

There is a solution that does not include SAS. Stata also support SQL databases. If you set up an ODBC connection to an SQL database then the -odbc load- command will allow you to load datasets directly from the SQL server. It is even possible to execute a SQL statement directly from Stata with -odbc exec("SqlStmt")- or -odbc sqlfile("filename")-. This way you could merge and load only the observations you are interested in directly from Stata in one command.

Before you can do this you need to download your data to an SQL server. If you don't have access to one, you could download one for FREE here: http://dev.mysql.com/downloads/mysql/4.0.html. After that you need to set up and ODBS driver manager, and an ODBC driver and download your data.

This is a bit of work but if you plan to do it a lot, it should be worth it.

I tried it a couple of months ago and it worked very well with Stata, however, labels are lost and there is only one code for missing values. Also, saving and storing data is a bit slower than from disc files.

If you have a Mac I would recommend the Complete MySQL package that has an easy set up and comes with drivers and additional software: http://www.apple.com/downloads/macosx/unix_open_source/ completemysql.html

Michael


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index