Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Jeph Herrin <info@flyingbuttress.net> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: large data sets (was st: A faster way to gsort) |
Date | Fri, 14 Mar 2014 10:04:05 -0400 |
On 3/13/2014 10:44 PM, Joseph Coveney wrote:
It sounds like you're pulling modest-to-large result sets out of the database, saving them as SAS dataset files and then going back and sort-merging them via PROC SQL with multigigabyte-sized result sets likewise pulled out of the database en passant--a situation that even SAS aficionados recommend avoiding in favor of pass-through queries.
I have not being doing that, but it is what the SAS analysts in this environment do - and it's one reason they prefer not to use Stata. I do as much as I can in native SQL, and then roll the results up in Stata. But this requires iterating queries over eg calendar year to ensure that the results I pull down are manageably small.
But the first point is an important one - my primary role here is not data analyst, mostly there are other analysts using SAS to create datasets that I can analyze in Stata. And it is likely to stay that way as long as SAS has the edge on data management using large databases.
cheers, Jeph * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/