Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: -odbc- with Microsoft Excel driver: weird behaviourinvolving missing values


From   "Eva Poen" <lexep2@nottingham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: -odbc- with Microsoft Excel driver: weird behaviourinvolving missing values
Date   Fri, 18 Aug 2006 18:18:30 +0100

Thank you, Kevin, for this easy to understand explanation of what is happening. 
Some colleagues keep sending me Excel files with data, and it is good to know 
that ODBC is _not_ a save way to import them into Stata (which is a pity, because 
I like the system a lot in conjunction with Access). The advice listed on 
http://www.ats.ucla.edu/stat/Stata/faq/odbc.htm does not make a point about
this.

Thanks also to Joseph for illustrating a workaround; I wasn't able to even guess 
what the underlying problem was. 

Eva


>>> kturner@stata.com 17/08/06 20:24 >>>
Eva Poen (lexep2@nottingham.ac.uk) writes:

> Symptoms: If a variable in the spreadsheet contains a certain number
> of missings, it will be imported with _all_ values set to missing,
> even if it has non-missing values in Excel. I work under Windows XP
> SP2, Stata 9.2, and Excel 2003.

This is an issue with the ODBC Excel driver. This is what I know:

Excel is not a database, but a spreadsheet. This means every Excel cell in a
column can have a different type. Note: Excel 'Formats' are not representative
of Excel's underlying cell type. Your blank/missing cell is not necessarily a
blank number or a blank string, just a blank. Therefore, when ODBC looks at a
column, it needs to determine a type for the column. That's a requirement for
ODBC and databases in general. To solve this problem, the driver apparently 
takes a sample of values from which to guess the column type. This is where 
your magic number applies. In your case, if it determines that the column is 
of blank type (unknown), it simply defaults to string. 

What comes next is really an unfortunate feature of the ODBC-Excel driver. 
When it does finally decide on a type for the whole column, it will only truly
import cells that EXACTLY match that type. It does no type conversion. For
non-conforming types, Stata either gets an empty string, or a NULL value, which
we handle accordingly. In your case, when the driver defaulted to string type
for the column, all the numeric values were seen as blank strings to Stata.

All of this is to say that the best solution might be to avoid Excel if you
know your columns have mixed types, or a lot of blank cells. Excel was never
designed to be a database even though people (myself included) use it as a
crude one.

--Kevin
kturner@stata.com 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html 
*   http://www.stata.com/support/statalist/faq 
*   http://www.ats.ucla.edu/stat/stata/


This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index