Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: RE: Length of a string


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   "Statalist" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Re: RE: Length of a string
Date   Mon, 4 Feb 2008 22:55:08 +0900

Lilian Galer wrote:

The problem is that I should prepare the file to be imported in a local made
software (in Oracle). This suppose to obtain a string in which I put the sql
command and all the information from my variables. I solved this problem by
creating two string variables and then in Excel I put the information from
both together. But, I want to do this in an automatic manner.

--------------------------------------------------------------------------------

Here are a couple of approaches that come to mind.  Others on the list might
offer better alternatives.

The first way is the easiest, but is the slowest..  In a -forvalues i =
1/`=_N' { . . . }- loop, concatenate the string variables in a local macro
variable and use -odbc exec()- to INSERT OR UPDATE the Oracle table row by
row directly from the Stata dataset in memory.  Use this method for small
datasets.  Take care not to introduce spurious extra spaces when
constructing the local macro variable:  the natural inclination is to type
`first_part' `second_part', but this will give you an unwanted space between
the components in the concatenated string; instead, omit the space:
`first_part'`second_part'.

The second way is analogous to your current method of using an Excel
worksheet as the intermediate, but is automatable (and obviates the need to
use Excel):  concatenate the string variables into an ASCII file
using -outfile, runtogether- and then import the ASCII file into Oracle
using the latter's bulk import command.  (I'm not familiar with Oracle, but
it must have such facility.)  You'll need to create delimiter variables in
Stata (variables of commas, tabs) and place them between the variables
containing data so that the ASCII file can be correctly parsed by Oracle's
bulk import command.  This way involves more work in setting it up, but as
mentioned both it and the bulk import can be automated, which is what you're
asking for; there's no tedious, irreproducible manual concatenation in
Excel.  This method is also the most efficient, especially if indexes,
transaction logging, etc., are toggled off for the bulk import.

A third way is to do the string concatenation in Oracle.  Insert the
unconcatenated string data directly into the database using -odbc insert- or
indirectly via -outsheet- / bulk import.  Use -odbc insert- for datasets of
up to a half-million observations or so and a dozen or so variables,
adjusting these limits according to your patience.  Use -outsheet- / bulk
import for anything larger.  For large datasets you can also go from the
Stata dataset file via Stat/Transfer into the database.  In my experience,
Stat/Transfer's ODBC is faster than Stata's -odbc-.

I've illustrated these three methods in the do-file below using Microsoft
Access as the stand-in for a relational database management system.
Attachments aren't permitted on Statalist, so the do-file downloads a zipped
template database from what I hope is a trustworthy site.  Direct it to
unzip the Access database file to your working directory when it prompts
you, and return to Stata without running the .bat file.  SQL statements
in -odbc exec()- cannot be continued on successive lines, and so I've used a
local macro variable, `SQL', to avoid wrapping during e-mailing.  The
do-file erases the downloaded files, but leaves behind the delimited ASCII
file from the second method so that you can take a look at it in a text
editor.  The third method's rudimentary illustration uses a temporary table
in the database (actually a conventional base table--temporary tables aren't
available in Access via ODBC).  You can implement the third method much
better in Oracle's PL/SQL, I'm sure.

Joseph Coveney

version 10
*
* Creating an illustrative dataset
*
clear *
set more off
set seed `=date("2008-02-04", "YMD")'
set obs 2
generate str my_key = string(floor(10^9 * uniform()), "%09.0f")
quietly foreach var of newlist first_chunk second_chunk {
   generate str `var' = ""
   forvalues i = 1/`=_N' {
       local my_string
       forvalues j = 1/244 {
           local my_string `my_string'`=char(65 + floor(26 * uniform()))'
       }
       replace `var' = "`my_string'" in `i'
   }
}
*
* Creating illustration database
*
copy http://www.cfsan.fda.gov/~download/opa-edi.exe dummy.exe
!dummy
erase dummy.exe
erase run.bat
local dsn MS Access Database;DBQ=edimdb.mdb;
// To prevent lines from wrapping in the e-mail . . .
local SQL CREATE TABLE MyTable (my_key CHAR(9) NOT NULL,
local SQL `SQL' my_string MEMO NULL)
odbc exec("`SQL'"), dsn("`dsn'")
local SQL CREATE INDEX mytable0 ON MyTable
local SQL `SQL' (my_key) WITH PRIMARY
odbc exec("`SQL'"), dsn("`dsn'")
*
* First method
*
foreach var of varlist _all {
   assert strpos(`var', "'") == 0
}
isid my_key
local SQL INSERT INTO MyTable VALUES
forvalues i = 1/`=_N' {
   local my_key = my_key[`i']
   local concatenated_string `=first_chunk[`i']'`=second_chunk[`i']'
   odbc exec("`SQL' ('`my_key'', '`concatenated_string'')"), dsn("`dsn'")
}
local SQL SELECT my_key, LEN(my_string) AS string_length FROM MyTable
odbc exec("`SQL'"), dsn("`dsn'")
*
*Second method
*
generate str comma = ", "
foreach var of varlist _all {
   assert strpos(`var', char(34)) == 0
}
generate str quote1 = char(34)
generate str quote2 = char(34)
* Begin digression
* If Oracle's bulk loader requires column names in the first row, . . .
tempfile tmpfil0
save `tmpfil0'
drop in 2/l
replace my_key = "my_key"
replace first_chunk = "my_string"
replace second_chunk = ""
append using `tmpfil0'
erase `tmpfil0'
* End digression
outfile my_key comma quote1 *_chunk quote2 ///
 using intermediate.csv, runtogether
*
* Third method
*
replace my_key = string(floor(10^9 * uniform()), "%09.0f")
isid my_key
odbc insert my_key *_chunk, table(TempTable) dsn("`dsn'") create
local SQL INSERT INTO MyTable SELECT my_key,
local SQL `SQL' [first_chunk] & [second_chunk] AS my_string
local SQL `SQL' FROM TempTable
odbc exec("`SQL'"), dsn("`dsn'")
odbc exec("DROP TABLE TempTable"), dsn("`dsn'")
local SQL SELECT my_key, LEN(my_string) AS string_length FROM MyTable
odbc exec("`SQL'"), dsn("`dsn'")
*
erase edimdb.mdb
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index