Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: read text file with multiple spaces


From   Joseph Coveney <jcoveney@bigplanet.com>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   Re: st: read text file with multiple spaces
Date   Fri, 19 Aug 2005 13:10:43 +0900

Yu Zhang wrote:

It's a shame to ask, but does anyone know how to read
data (text file) with multiple spaces between
variables?  The number of spaces may vary, so I cannot
use:

. insheet using file, delim(" ")

The only way I figured out is to count the number of
variables first (e.g., using Perl) and then use:

. infile var1-var# using file

Is there a more direct way?

--------------------------------------------------------------------------------

My guess would be to do the same in Stata as you would do in Perl to
identify variables.

For example, if there is only a single space between tokens within any
string
variable, and there are at least two spaces (maybe more) between each pair
of variables, then:
1. insheet into Stata into a single string variable (mind the limit for
string variable length),
2. use Stata's limited regular expressions capability to convert multiple
spaces to a convenient delimiter (choose one not otherwise present in the
string variables' data),
3. convert multiple delimiters to single delimiters (mind blank cells),
4. export the delimited dataset as an ASCII spreadsheet from Stata (using
the -no quote- option) to a temporary file, and then
5. re-import the delimited spreadsheet into Stata.

Joseph Coveney

* Creating demonstration spreadsheet
clear
set more off
set obs 3
generate str var1 = "column1  column2    column3"
replace var1 = ///
  "This is the first column.  This is the second column.    " ///
  + "This is the third column." in 2
replace var1 = ///
  "The first-second is two spaces.  " ///
  + "The second-third is four spaces.    "  in 3
* Check these last lines above--they might have line-wrapped
* in the e-mail handler.
outsheet using space_delimited_text_spreadsheet.prn, noname noquote
clear
*
* Begin here
*
insheet using space_delimited_text_spreadsheet.prn
replace v1 = subinstr(v1, "  ", "; ", .)
replace v1 = subinstr(v1, "; ; ", "; ", .)
tempfile tmpfil0
outsheet using `tmpfil0', nonames noquote
insheet using `tmpfil0', names delimiter(";") clear
erase `tmpfil0'
list, clean
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index