Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: ODBC vs. insheet variable format troubles (importing SQL data)


From   Abe N <aben468@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: ODBC vs. insheet variable format troubles (importing SQL data)
Date   Wed, 23 Jan 2013 16:43:44 -0800 (PST)

Hi, thanks for the reply!  The code I'm using is:

odbc load, exec("select * from dbo.ResearchSource where PRINCIPAL_PROCEDURE_GROUP between 302034 and 302039") dsn("servername") lowercase

This pulls roughly 30k records with 160 variables.  Some of the variables are over 32 characters so it renames it to var35 for example and I manually rename them after.  To double check that something was messing up, what I did was a duplicates tag command based on patient psuedo id, hospital, admission date, and diagnoses.  Almost 10-15k of the records were some form of duplicate.  I then looked at a set of 12 duplicate records, found their unique record_id's by using tab, and pulled up those records in SQL using

select * from ResearchSource where (RECORD_ID=some# OR RECORD_ID=some# OR etc)

I then cross checked using STATA's data browser and what was in SQL and they did not match.  Basically the first row/record_id had accurate values, but then the other 11 had repeated values from the first row (except for record_id which remained unique).  Not sure why it was doing this.  I can't copy the output here, the security settings on the server I work on are quite stringent.  

Lastly, I'm not sure how to check what ODBC manager I'm using.  Is there a way to check?  From reading the help file I'm guessing iODBC is what it's using because I haven't changed it and I'm not on unix, but not 100% sure.



--- On Wed, 1/23/13, Dimitriy V. Masterov <dvmaster@gmail.com> wrote:

> From: Dimitriy V. Masterov <dvmaster@gmail.com>
> Subject: Re: st: ODBC vs. insheet variable format troubles (importing SQL data)
> To: "Statalist" <statalist@hsphsun2.harvard.edu>
> Date: Wednesday, January 23, 2013, 2:08 PM
> Abe,
> 
> I load SQL data via odbc without any hiccups, and at first I
> spent a
> fair amount of time repeating calculations in SQL and Stata
> to make
> sure I was doing things correctly.
> 
> Could you replicate the error with a simple query like
> "select top 5
> record_id from ..."? It might help if we see the code and
> the output.
> I would recommend solving this rather than the insheet
> approach. Also,
> which odbc manager are you using?
> 
> DVM
> 
> On Wed, Jan 23, 2013 at 1:06 PM, Abe N <aben468@yahoo.com>
> wrote:
> > Hello Everyone,
> >
> > Basically wondering if there is a way I can conserve
> variable data formats when I bring data manually from SQL to
> STATA using insheet (see below for more explanation and
> trouble I had with ODBC).
> >
> > I was originally told by the people maintaining the SQL
> database to use:
> >
> > odbc load, exec("Select command here")
> dsn("databasename") lowercase
> >
> > to bring data into STATA, which seemed to be working
> great for my purposes until I did a duplicates report and
> found that I had way too many duplicates.  After
> cross-checking by record_id (unique) with the original SQL
> data I found the odbc command wasn't importing data
> properly.  I had already set up the variable list
> though (some were too long so I had to manually rename
> them), so I used describe and I think display
> "`r(varlist)'", copied that varlist to a text file for what
> I would do in my next step using insheet.
> >
> > Basically, ran my SQL query in SQL Server Management
> Studio 2008 R2, saved the results/data as a tab delimited
> text and used:
> >
> > insheet (list of 160 variables) using data.txt, tab
> >
> > Which seems to be working great.  The only problem
> I'm having now is that while odbc maintained the variable
> data formats (string vs double, etc from SQL), this new
> method doesn't and it would be much easier for me if it
> didn't convert some of my strings into numbers, etc.
> >
> > So what I'm wondering is if there's a way I can sort of
> generate the variables, their data formats, and labels using
> my original odbc method, clear out that incorrect data, and
> then pull in data from the text file afterward?  If
> not, is there a way I can carry over at least the variable
> labels quickly rather than manually doing all 160 or so?
> >
> > Sorry for the long winded explanation, but hope the
> situation/question gets across clearly.
> >
> > Best,
> > Abe Noorbakhsh
> > aben468@yahoo.com
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index