Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Data loss loading integers via ODBC


From   jhassell@stata.com (James Hassell, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Data loss loading integers via ODBC
Date   Mon, 09 Jul 2007 14:57:33 -0500

Dave Ewart <davee@ceu.ox.ac.uk> reported a case in which Stata could loose
data when importing via ODBC.  He was importing data from PostgreSQL and 
MySQL, and the data was stored using the "smallint" type.  Stata stores
such values as -int-.  As Dave wrote, 

> In PostgreSQL and MySQL, there exists a smallint type: -32768 to +32767
>
> However, Stata's 'int' type is -32,767 to 32,740

The values 32,741 through 32,767 are used to store Stata's missing values, .,
.a, .b, ..., .z.  Thus, values in the original data between 32,741 and 32,767
turn into Stata missing values.

Dave also noted the same problem can arise with -byte- and -long-.

We will fix the problem.  In the meantime, we have a workaround.


Workaround 1
------------

We have written command -fixsql- to fix the problem after the data have 
been imported.  To obtain the command, type 

        . net from http://www.stata.com
        . net cd users/jhassell
        . net install fixsql

This will install -fixsql- which has syntax 

        fixsql  <varlist>

This program can fix the problem Dave describes ASSUMING THERE WERE 
NO NULL VALUES IN THE ORIGINAL SQL DATASET.

For each variable in <varlist> that is currently a -byte-, -int-, or -long-,
and that contains missing values, -fixsql- will convert it to -int-, -long-,
or -double-, and fix each missing value to be the original, appropriate value.

Thus, to fix the problem Dave describe, he could just type

        . fixsql _all

but it would be safer if he typed

        . fixsql f

because f was the name of the variable Dave mentioned.



Workaround 2
------------

Workaround 2 is for the case where there were NULL values in the original.
This workaround passes the SQL statement directly to ODBC and asks the
database to cast the result to the desired type.

In PostgreSQL, Dave could do the following...

        . odbc load, exec("SELECT CAST(f AS float8) from <table name>")
                     dsn("<dsn>")

The above would also work for MySQL, but for other database types, what you 
need to type is vendor specific.


-- James
   jhassell@stata.com
~                                       
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index