Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: selecting obs while reading in huge data set

From   "Hoetker, Glenn" <[email protected]>
To   <[email protected]>
Subject   RE: st: selecting obs while reading in huge data set
Date   Thu, 19 Aug 2004 10:11:06 -0500

of some amazing things and Jeroen Weesie's mmerge command simplifies
many things that I used to use SAS's SQL command for.  However, with
really large data sets, Stata still lags, sometimes unacceptably.  Using
the odbc command with mysql or a similar SQL database program is a nice
compromise (although I must say that SAS's SQL command is much simpler
than most things SAS does.  Also, their book "SAS Guide to the SQL
Procedure" is quite good and most of it is not SAS specific). 
       Glenn Hoetker
       Assistant Professor of Strategy
       College of Business
       University of Illinois at Urbana-Champaign
       [email protected]
       -----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael Ingre
Sent: Thursday, August 19, 2004 8:11 AM
To: [email protected]
Subject: Re: st: selecting obs while reading in huge data set
       On 2004-08-19, at 10.15, Steve Stillman wrote:
       > Sascha,
       > I have a recommendation that I wouldn't usually make.  I have
       > recently doing work with matched employer-employee data with
over 30
       > million obs, so we have been running into the same problem as
you.  SAS
       > is much better for large dataset merges than Stata.  In
       > proc
       > SQL is remarkably fast at doing these types of merges (likely
       > SQL is written with this type of operation in mind).
       > Well there is was, likely the last time I will recommend SAS
       > Stata.
       > Cheers,
       > Steve
       Steve and Sascha
       There is a solution that does not include SAS. Stata also support
       databases. If you set up an ODBC connection to an SQL database
then the  
       -odbc load- command will allow you to load datasets directly from
       SQL server. It is even possible to execute a SQL statement
       from Stata with -odbc exec("SqlStmt")- or -odbc
       This way you could merge and load only the observations you are  
       interested in directly from Stata in one command.
       Before you can do this you need to download your data to an SQL
       If you don't have access to one, you could download one for FREE
here: After that you
need to  
       set up and ODBS driver manager, and an ODBC driver and download
       This is a bit of work but if you plan to do it a lot, it should
       worth it.
       I tried it a couple of months ago and it worked very well with
       however, labels are lost and there is only one code for missing
       Also, saving and storing data is a bit slower than from disc
       If you have a Mac I would recommend the Complete MySQL package
that has  
       an easy set up and comes with drivers and additional software: 
       *   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index