Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Using OBDC to reference a Stata file?


From   "Jeph Herrin" <stata@spandrel.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Using OBDC to reference a Stata file?
Date   Wed, 2 Oct 2013 10:22:50 -0400

Thanks Sergiy,

I think I fixed the email problem, here's hoping this one shows up with my
name.

Also thanks for the tips. In the end I did manage to stack the queries. I
get back about 10million obs, which is manageable, whereas before it was
136m (=21gb) which was giving me some trouble on the 24gb (virtual) machine.

cheers,
J


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy Radyakin
Sent: Tuesday, October 01, 2013 9:49 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Using OBDC to reference a Stata file?

Jeph, first please kindly check the settings of your mailer to put your name
to the posts. They are currently appear as postings from 'Stata', that might
confuse quite a bit. See how it appears here in
archives:
http://www.stata.com/statalist/archive/2013-10/
Real name requirement is from statalist faq, #2.1.3

Second, you must communicate the list of the desired ids to the sql server
one way or another. Either you give the rule for selection (this is the
first approach, where the server will mine the ids itself), or you go the
direct approach and submit a mega query:

SELECT *
FROM CustomerOrders co
WHERE co.CustomerID = 'id0001' OR co.CustomerID = 'id0021' OR co.CustomerID
= 'id0302'....................

Formation of this query is easily automated by looping over the returned ids
from your first query.

You should be concerned about the length of the query.
This might be helpful to read:
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/b1ae1a07-dd26-4cd6-a
b25-3215ec1827c6/maximum-length-of-a-sql-statement

and you are probably going to hit some of the Stata limits too unless your
query is expected to return just a handful of results (obtained perhaps as a
result of a very complex query).

Best, Sergiy Radyakin



On Tue, Oct 1, 2013 at 9:25 AM,  <stata@spandrel.net> wrote:
> Thanks. You are roughly correct,  though in fact my first (and second) 
> queries already consiste of stacked sets of JOIN statements in order 
> to construct the different lists. I have tried combining the two, but 
> it becomes almost impossible to read/debug, so my hope was to create 
> one list and then refer to it when creating the second.
>
> Cheers,
> J
>
>
>
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy 
> Radyakin
> Sent: Monday, September 30, 2013 6:01 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Using OBDC to reference a Stata file?
>
> Jeph, suppose your original mySQL is something like:
>
> SELECT *
> FROM Customers c
> WHERE c.State = 'NY'
>
> and now you want to get their orders. Then just do the inner join with 
> the same condition in -where-.
>
> SELECT *
> FROM Customers c
> INNER JOIN CustomerOrders co
>     ON co.CustomerID = c.CustomerID
> WHERE c.State = 'NY'
>
> There are two (at least) downsides:
> 1. you can not pre-filter the list of ids in Stata (after your first
query).
> You have to put everything in SQL, and 2. this will be a second query 
> to the database, which potentially might produce more records than the 
> original query (if the DB was updated in the meanwhile).
>
> Best, Sergiy Radyakin
>
>
> On Mon, Sep 30, 2013 at 4:30 PM,  <stata@spandrel.net> wrote:
>> I'm trying to use Stata to construct tables from a very large 
>> database (ie,
>> terrabytes) using ODBC.  I'm new to SQL, but because of the size of 
>> the database am trying to do as much as I can via the SQL requests to 
>> avoid working with very large files.
>>
>> Currently, I am able to create, say, table1.dta which is a list of 
>> identifiers that meet a certain criteria; this results from a very 
>> complex SQL statement that I submit via
>>
>> . odbc, exec(`mySQL')
>> . save table1
>>
>> Now I would like to use table1 in my next SQL query so I select only 
>> those identifiers. Possible?
>>
>> It seems not, but I feel compelled to ask because the advice I have 
>> received from those who routinely use this database is to use SAS, 
>> which will in fact let me reference SAS datasets in an SQL procedure. 
>> I'd
> prefer not to.
>>
>> Thanks,
>> Jeph
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index