[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: -odbc- with Microsoft Excel driver: weird behaviour involving missing values
Eva Poen (firstname.lastname@example.org) writes:
> Symptoms: If a variable in the spreadsheet contains a certain number
> of missings, it will be imported with _all_ values set to missing,
> even if it has non-missing values in Excel. I work under Windows XP
> SP2, Stata 9.2, and Excel 2003.
This is an issue with the ODBC Excel driver. This is what I know:
Excel is not a database, but a spreadsheet. This means every Excel cell in a
column can have a different type. Note: Excel 'Formats' are not representative
of Excel's underlying cell type. Your blank/missing cell is not necessarily a
blank number or a blank string, just a blank. Therefore, when ODBC looks at a
column, it needs to determine a type for the column. That's a requirement for
ODBC and databases in general. To solve this problem, the driver apparently
takes a sample of values from which to guess the column type. This is where
your magic number applies. In your case, if it determines that the column is
of blank type (unknown), it simply defaults to string.
What comes next is really an unfortunate feature of the ODBC-Excel driver.
When it does finally decide on a type for the whole column, it will only truly
import cells that EXACTLY match that type. It does no type conversion. For
non-conforming types, Stata either gets an empty string, or a NULL value, which
we handle accordingly. In your case, when the driver defaulted to string type
for the column, all the numeric values were seen as blank strings to Stata.
All of this is to say that the best solution might be to avoid Excel if you
know your columns have mixed types, or a lot of blank cells. Excel was never
designed to be a database even though people (myself included) use it as a
* For searches and help try: