Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -odbc- with Microsoft Excel driver: weird behaviour involving missing values


From   "Eva Poen" <[email protected]>
To   [email protected]
Subject   st: -odbc- with Microsoft Excel driver: weird behaviour involving missing values
Date   Thu, 17 Aug 2006 16:48:31 +0100

Hello

Today I ran into a strange (and worrying) behaviour of -odbc- in
connection with Microsoft Excel spreadsheets. It'd be interesting to
know if anyone else can reproduce this.

Symptoms: If a variable in the spreadsheet contains a certain number
of missings, it will be imported with _all_ values set to missing,
even if it has non-missing values in Excel. I work under Windows XP
SP2, Stata 9.2, and Excel 2003.

I have no idea if this is a Microsoft/ODBC/Excel problem, or a Stata
-odbc- problem.

Here are the steps to reproduce:

1. Create a Microsoft Excel file with two columns (say, "Var1" and
"Var2") and 30 observations. Var1 holds numbers 1 to 30 in ascending
order. Var2 has empty cells in observations 1 to 25, and holds some
numbers (no strings) in observations 26 to 30.

2. Save spreadsheet and close Excel.

3. Launch Stata and issue (modify -dsn- and -table- accordingly)
-odbc load, dsn("Excel Files;DBQ=C:\Test\missingtest.xls") table("Sheet1$")-
followed by
-destring, replace-

4. -summarize-

Result on my machine: Var2 is reported to have missing values in all
30 observations.

Some experimenting revealed the following:
- Changing Var2 in observation 25 to a number in Excel results in a
correct import of the data. 25 missings in the first observations seem
to be a "magic" threshold here.
- Changing Var2 in observation 25 to an empty cell, and observation 26
to a string results in only the one string to be imported correctly;
observations 27 to 30 are set to missing.
- On my friend's computer (same setup, but Office 2002) the "magic"
threshold turned out to be 8 (that is, if the first 8 observations
have missings, the rest of the numeric data in this variable will be
discarded).

The version of the Microsoft Excel driver, as reported in Windows ODBC
Data Source Administrator, is 4.00.6304.00 .

I was not able to reproduce the problem with -odbc- in connection with
Microsoft Access; the same table will be correctly imported from
test.mdb using -odbc load, dsn(test) table(missingtest3)-.

Any comments welcome.

Best regards,
Eva Poen
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index