Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: Appending several excel data sets into one


From   Joseph Coveney <jcoveney@bigplanet.com>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   Re: RE: st: Appending several excel data sets into one
Date   Mon, 17 Sep 2007 20:38:03 -0700

Allan Reese wrote:

Couple of comments on original question "How can one append several Excel
sheets in one stata data set in a simple program ..."

1. With great care! Excel is so easy to use because it doesn't impose
typing, and using multiple sheets just increases the problem. Unless the
sheets were very well supported by VBA and macros (about as common as Yeti
or Loch Ness monster), you'll find that apparently similar variables are
numeric in one file, strings (of varying lengths) in another, or a mixture
of numbers with some strings that read into Stata as missing. Dates are a
nightmare. My experience has been that Excel sheets have very little chance
of doing a clean join.

2. If workbooks contain multiple sheets, you would like to access the sheet
names. Excel doesn't provide this facility, but an add-in called
ASAP-utilities http://www.asap-utilities.com/ includes one to create an
index sheet listing all sheets in the workbook. We used this to write
macros to copy data from multiple sheets within a book into one sheet for
export.

--------------------------------------------------------------------------------

No disagreement about the challenges involved in receiving data in the form
of Excel workbooks, but there are a couple of things that make it a little
easier.

First, you can force Excel to provide you with everything in a mixed-type
column as a string. This will avoid mixed types being read into Stata
inappropriately as missing. See
http://dailydoseofexcel.com/archives/2004/06/03/external-data-mixed-data-types/
or
http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/
for how; scroll down the Web page until you see the discussion about ODBC,
which Stata uses. (The technique is shown in the do-file below.)

Second, Stata's -odbc query- will provide a list of worksheets in an Excel
workbook, and you can retrieve the information in a round-about manner for
use in looping through multiple sheets in a workbook when the names of the
sheets are unknown beforehand. This is also shown below in the do-file.
(The same general workaround can also be used in retrieving variable name
and storage type information from -describe using-, one of the earlier questions on the list this past week.)

The probe workbook ("Test.xls") contains two worksheets with two columns
(column_1 and column_2), each of mixed data types (string, numeric, date).
I cannot attach the workbook, but you can see its contents in the -list-
output below the do-file. The do-file appends all of the data in both
worksheets into one Stata dataset, all as strings. Once the data are safely
ensconced in Stata, you can -destring- or whatever, as needed.

Again, no contention about the data-management abuses facilitated by Excel,
but like the original poster, I'm often called upon to receive data in that
format despite my druthers.

Joseph Coveney

clear *
set more off
tempfile tmpfil0
*
local dsn Excel Files;DBQ=Test.xls;DefaultDir=F:/;
*
* Retrieving the names of workbooks into a macro
*
quietly log using `tmpfil0', text
odbc query "`dsn'"
quietly log close
*
infix str244 a 1-244 using `tmpfil0', clear
generate byte keep = sum(strpos(a, "---") == 1)
keep if keep == 1
drop in 1
compress
levelsof a, local(spreadsheets)
*
* Looping through the list, appending
*
* Putting some of the long SQL statement
* into a macro in order to avoid wrapping
* in the e-mail to the list
local prefix [Excel 8.0;HDR=YES;IMEX=1;Database=F:\Test.xls;].
*
local sheet 0
foreach spreadsheet of local spreadsheets {
odbc load, ///
exec("SELECT * FROM `prefix'[`spreadsheet'];") dsn("`dsn'") clear
generate byte sheet = `++sheet'
if (`sheet' == 1) {
save `tmpfil0', replace
}
else {
append using `tmpfil0'
save `tmpfil0', replace
}
}
erase `tmpfil0'
list, noobs string(20) abbreviate(15) sepby(sheet)
exit

Output from the do-file using the probe Excel workbook, Test.xls

. list, noobs string(20) abbreviate(15) sepby(sheet)

+-------------------------------------------+
| column_1 column_2 sheet |
|-------------------------------------------|
| .a Hello, world! 2 |
| .b 4 2 |
| .c 01-May-1969 2 |
| 7890 .r 2 |
|-------------------------------------------|
| 12345 no stinkin' badges 1 |
| my fair lady 2007-09-17 1 |
+-------------------------------------------+

. exit

end of do-file


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index