Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: read text file with multiple spaces


From   "Donald Spady" <dspady@ualberta.ca>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: read text file with multiple spaces
Date   Fri, 19 Aug 2005 08:09:05 -0600

If the file has multiple spaces between the variables, but the same multiple
spaces in each record (i.e. same format for each record) then you should be
able to input it (at least into SPSS, I don't know about Stata) using the
old Fortran type input command. E.g. 3F2.1 6x F3.0 (means 3 2digit (or
string length 2) variables (2 spaces) then skip 6 spaces and read a 3 digit
variable. I know it is something like this but it has been a while since I
have done it.
Don Spady

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Joseph Coveney
Sent: Thursday, August 18, 2005 10:11 PM
To: Statalist
Subject: Re: st: read text file with multiple spaces

Yu Zhang wrote:

It's a shame to ask, but does anyone know how to read
data (text file) with multiple spaces between
variables?  The number of spaces may vary, so I cannot
use:

. insheet using file, delim(" ")

The only way I figured out is to count the number of
variables first (e.g., using Perl) and then use:

. infile var1-var# using file

Is there a more direct way?

----------------------------------------------------------------------------
----

My guess would be to do the same in Stata as you would do in Perl to
identify variables.

For example, if there is only a single space between tokens within any
string
variable, and there are at least two spaces (maybe more) between each pair
of variables, then:
1. insheet into Stata into a single string variable (mind the limit for
string variable length),
2. use Stata's limited regular expressions capability to convert multiple
spaces to a convenient delimiter (choose one not otherwise present in the
string variables' data),
3. convert multiple delimiters to single delimiters (mind blank cells),
4. export the delimited dataset as an ASCII spreadsheet from Stata (using
the -no quote- option) to a temporary file, and then
5. re-import the delimited spreadsheet into Stata.

Joseph Coveney

* Creating demonstration spreadsheet
clear
set more off
set obs 3
generate str var1 = "column1  column2    column3"
replace var1 = ///
  "This is the first column.  This is the second column.    " ///
  + "This is the third column." in 2
replace var1 = ///
  "The first-second is two spaces.  " ///
  + "The second-third is four spaces.    "  in 3
* Check these last lines above--they might have line-wrapped
* in the e-mail handler.
outsheet using space_delimited_text_spreadsheet.prn, noname noquote
clear
*
* Begin here
*
insheet using space_delimited_text_spreadsheet.prn
replace v1 = subinstr(v1, "  ", "; ", .)
replace v1 = subinstr(v1, "; ; ", "; ", .)
tempfile tmpfil0
outsheet using `tmpfil0', nonames noquote
insheet using `tmpfil0', names delimiter(";") clear
erase `tmpfil0'
list, clean
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index