Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: Cannot get date fields to import from PostgreSQL via ODBC


From   Dave Ewart <davee@ceu.ox.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: Cannot get date fields to import from PostgreSQL via ODBC
Date   Wed, 28 Sep 2005 09:58:23 +0100

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday, 27.09.2005 at 17:07 +0100, Nick Cox wrote:

> > > Can you suggest a way to 'transparently' import these dates as
> > > 'proper' date variables?
> > 
> > Reply aimed at Kevin Crow:
> > 
> > Joseph Coveney's very helpful workaround notwithstanding, can I assume
> > that a direct import of date-type values over ODBC in this 
> > manner is not
> > possible with Stata?
> > 
> > Is this considered a bug by the Stata developers?  (This isn't a
> > criticism, rather a request for clarification!)
>
> For Stata, date variables are not a variable type, 
> or a set of variable types. Dates are held as integer-valued 
> numeric variables and the date part works through formats 
> and functions. 
> 
> In Stata, naturally, you can also hold a date as a string, 
> but its date attributes then lie in the eye of the beholder. 
> 
> Other programs have other attitudes, but the key point here, 
> I believe, is that Stata has no concept of a date-type 
> variable and even less concept of the meaning of a string 
> variable. 

Yeah, I appreciate the way that a date is stored 'behind the scenes',
however its failure to work as expected regarding importing from ODBC
sources is inconsistent with the way other data types are identified and
used:

e.g. An example table 'mixbag' which was defined as follows in
PostgreSQL and is artificially created to contain many different data
types: (c was originally 'float' and d was 'double precision', but I
think I have an option set to default all reals to double precision
hence c and d look the same)

 Column |           Type
- --------+--------------------------
 a      | smallint
 b      | integer
 c      | double precision
 d      | double precision
 e      | date
 f      | time without time zone
 g      | timestamp with time zone
 h      | character(10)
 i      | character varying(10)
 j      | text

In Stata:

. odbc describe mixbag

DataSource: pgstuff (query)
Table:      mixbag (load)
- -------------------------------------------------------------------------------
Variable Name                               Variable Type
- -------------------------------------------------------------------------------
a                                           int2
b                                           int4
c                                           float8
d                                           float8
e                                           date
f                                           time
g                                           timestamptz
h                                           bpchar
i                                           varchar
j                                           text
- -------------------------------------------------------------------------------

So, at the 'describe' stage, Stata is perfectly aware of the correct
data types for each field.

A non-loading select shows the correct data:

. odbc exec("select * from mixbag"), dsn(pgstuff)

    1. +------------------------------------------------------------------------+
       |a | 1                                                                   |
       |b | 2                                                                   |
       |c | 1.5                                                                 |
       |d | 2.123412341234                                                      |
       |e | 2005-09-23                                                          |
       |f | 15:15:15                                                            |
       |g | 2005-09-23 15:15:15                                                 |
       |h | abcdefghij                                                          |
       |i | abcde                                                               |
       |j | blah blah blah blah blah                                            |
       +------------------------------------------------------------------------+

Now we load the data for real:

. odbc load, exec("select * from mixbag") dsn(pgstuff)

. list

     +-------------------------------------------------------------------------+
  1. | a | b |   c |         d |         e |        f |         g |          h |
     | 1 | 2 | 1.5 | 2.1234123 | 01jan1960 | 15:15:15 | 01jan1960 | abcdefghij |
     |-------------------------------------------------------------------------|
     |              i           |                                  j           |
     |          abcde           |           blah blah blah blah blah           |
     +-------------------------------------------------------------------------+

. desc

Contains data
  obs:             1                          
 vars:            10                          
 size:           314 (99.9% of memory free)
- -------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
- -------------------------------------------------------------------------------
a               int    %8.0g                  
b               long   %12.0g                 
c               double %10.0g                 
d               double %10.0g                 
e               double %td                    
f               str8   %9s                    
g               double %td                    
h               str10  %10s                   
i               str10  %10s                   
j               str244 %244s                  
- -------------------------------------------------------------------------------

So:

    1. Integer variables are identified as integers, and are correctly
    imported as integers and appropriately formatted as integers - THIS
    IS OK;

    2. Real variables are identified as reals, and are correctly
    imported as reals and appropriately formatted as reals - THIS IS OK;

    3. True string variables are identified as strings, are imported
    correctly as strings and appropriately formatted as strings - THIS
    IS OK;

    4. The date and date/time variables, e and g, are correctly
    identified as 'date' and 'timestamptz', Stata tries to format these
    variables as dates with the format %td (thus 'believing' them to be
    dates), yet it has failed to actually import the correct values.
    The time variable, f, is similarly mis-handled.  THIS IS *WRONG*

My point (hey, I got there in the end!) is that the date fields are
handled inconsistently compared to the other data types.  And, as such,
my request to Kevin about whether he considers this to be a bug or not
...

Dave.
- -- 
Dave Ewart
davee@ceu.ox.ac.uk
Computing Manager, Cancer Epidemiology Unit
Cancer Research UK / Oxford University
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370
Get key from http://www.ceu.ox.ac.uk/~davee/davee-ceu-ox-ac-uk.asc
N 51.7518, W 1.2016
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFDOlsvbpQs/WlN43ARAia4AKCeGLlLNS4XdDybf+tV+iLR1CDlBACaAiW2
O3bydZkZg4MLeyurWqlddHI=
=snzp
-----END PGP SIGNATURE-----
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index