Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Keeping significant spaces with -infile- and -infix-


From   Joseph Coveney <jcoveney@bigplanet.com>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   st: Keeping significant spaces with -infile- and -infix-
Date   Fri, 18 May 2007 12:21:41 +0900

How can significant unimbedded spaces of string variables be retained when
importing fixed-width ASCII files?

-infile- with a dictionary truncates both the leading and trailing spaces of
string variables.  An example of when this is inappropriate is illustrated
below.  The example is a toy of a real data-management task in which
long-string variables are read into Stata (in pieces that are 244 characters
or less) from fixed-width ASCII files, manipulated as needed using Stata,
and then exported to an ODBC-compliant application where they are
reassembled by concatenation using SQL.

-infix- does the same truncation.

In the past, I've used -filefilter- to substitute char(160) as a
place-holder (space-holder), imported the substituted file, and then
used -subinstr()- to back-out the substitutions.  This workaround is also
illustrated below.  I can't help thinking that there must be a much more
straightforward approach that I'm overlooking.  (Given the larger task at
hand, employing -file- or Mata's -cat()- would seem at least as convoluted a
workaround as the character-substitution one.)

The user manual implies that at least leading blanks will always be skipped
by -infile-.  It seems as if this problem could have come up before on the
list, but a search on the keywords "spaces" and "fixed width" didn't turn
anything up.

Joseph Coveney

---------------begin Mary.dat-------------------
1234567890123456789012345678901234567
Mary had a little lamb.  Its        1
fleece was white as snow, and       2
everywhere that Mary went, the      3
lamb was sure to go.                4
----------------end Mary.dat-------------------

----------------begin Mary.dct-----------------
infile dictionary using Mary.dat {
   _firstlineoffile(2)
   str5 a_01 %5s
   str5 a_02 %5s
   str5 a_03 %5s
   str5 a_04 %5s
   str5 a_05 %5s
   str5 a_06 %5s
   str5 a_07 %5s "I'm all blank; -drop- me"
   str2 line %2s "I'm a number; -destring- me"
}
------------------end Mary.dct-----------------

---------------begin Mary.do-------------------
* Doesn't work
quietly infile using Mary.dct, clear
generate str A = a_01 + a_02 + a_03 + a_04 + a_05 + a_06
list A, noobs separator(0)
* Workaround
tempfile tmpfil0
filefilter Mary.dat "`tmpfil0'", from(" ") to(\160d)
quietly infile using Mary.dct, using("`tmpfil0'") clear
foreach var of varlist _all {
   capture assert indexnot(`var', char(160)) == 0
   if !_rc {
       drop `var'
       continue
   }
   capture assert strpos(`var', char(160)) == 0
   local has_it = _rc
   while `has_it' {
       quietly replace `var' = subinstr(`var', char(160), " ", .)
       capture assert strpos(`var', char(160)) == 0
       local has_it = _rc
   }
   quietly destring `var', replace
}
generate str A = a_01 + a_02 + a_03 + a_04 + a_05 + a_06
list A, noobs separator(0)
exit
--------------end Mary.do-------------------------

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index