Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: odbc load from SQL if ID is in my Stata master dataset?


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: odbc load from SQL if ID is in my Stata master dataset?
Date   Tue, 18 Sep 2012 21:35:53 +0900

Jen Zhen wrote:

I have a list of person ID numbers for whom I would like to merge in
information from a large SQL dataset (containing the IDs in the first
list, plus many more).

I assume that I cannot directly merge in from the SQL dataset, but
must first load the SQL dataset into Stata, save as dta, and then I
can merge that into my main dataset.

However, I'm wondering how, when loading from SQL with --odbc load-- I
can best specify that I'm only interested in observations with a fixed
list of ID numbers?

What would work is to write "WHERE id=51 OR id=332 OR ...", but since
I have several hundred valid IDs this seems inefficient.

Would anyone happen to know a better way?

--------------------------------------------------------------------------------

Have you considered creating a temporary table in the database, uploading the
Stata dataset of IDs into it (using -odbc insert-), and finally -odbc load,
exec()- where the exec()'s SQL statement INNER JOINs the SQL table and your
temporary table of IDs ON SQLdataset.ID = TemporaryStataTable.ID?  This would
probably be the most efficient and cleanest.  On the downside, if it's a
so-called production database, then you very likely don't have sufficient data
definition language privileges to go this route.  (You might want to talk to
your database administrator and see whether you can get privileges to create,
populate and drop temporary tables in an out-of-the way location or schema
inside the database.)

I doubt that the string length limits of ODBC's SQL statements will allow either
your WHERE clause or the following, but you can try something like WHERE ID IN
() as a shorthand for your WHERE clause, assuming that ID's a primary key and so
never NULL.

    local SQL_statement SELECT * FROM MyTable WHERE ID IN (
    drop if missing(ID)
    quietly count
    forvalues row = 1/`=`r(N)'-1' {
        local SQL_statement `SQL_statement' `=ID[`row']',
    }
    local SQL_statement `SQL_statement' `=ID[`r(N)']');
    odbc load, exec("`SQL_statement'") . . .

I recall StataCorp's mentioning an inside-the-database version of Stata.  I'm
not sure how that might work for you, but it's something worth looking into,
too.

Joseph Coveney




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index