[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: read text file with multiple spaces
Since you are already working with Perl, you could have find an easier way out.
In this case, I'll replace spaces with "|", and use delim in insheet command.
In perl you could say: perl -lane r/ /\|/g filename
If you wish to do it mannually: In any text processor I'll replace all
consecutive spaces with "|" using find-replace command, until all
consecutive "|" are removed, and then insheet the file.
On 8/19/05, Joseph Coveney <email@example.com> wrote:
> Yu Zhang wrote:
> It's a shame to ask, but does anyone know how to read
> data (text file) with multiple spaces between
> variables? The number of spaces may vary, so I cannot
> . insheet using file, delim(" ")
> The only way I figured out is to count the number of
> variables first (e.g., using Perl) and then use:
> . infile var1-var# using file
> Is there a more direct way?
> My guess would be to do the same in Stata as you would do in Perl to
> identify variables.
> For example, if there is only a single space between tokens within any
> variable, and there are at least two spaces (maybe more) between each pair
> of variables, then:
> 1. insheet into Stata into a single string variable (mind the limit for
> string variable length),
> 2. use Stata's limited regular expressions capability to convert multiple
> spaces to a convenient delimiter (choose one not otherwise present in the
> string variables' data),
> 3. convert multiple delimiters to single delimiters (mind blank cells),
> 4. export the delimited dataset as an ASCII spreadsheet from Stata (using
> the -no quote- option) to a temporary file, and then
> 5. re-import the delimited spreadsheet into Stata.
> Joseph Coveney
> * Creating demonstration spreadsheet
> set more off
> set obs 3
> generate str var1 = "column1 column2 column3"
> replace var1 = ///
> "This is the first column. This is the second column. " ///
> + "This is the third column." in 2
> replace var1 = ///
> "The first-second is two spaces. " ///
> + "The second-third is four spaces. " in 3
> * Check these last lines above--they might have line-wrapped
> * in the e-mail handler.
> outsheet using space_delimited_text_spreadsheet.prn, noname noquote
> * Begin here
> insheet using space_delimited_text_spreadsheet.prn
> replace v1 = subinstr(v1, " ", "; ", .)
> replace v1 = subinstr(v1, "; ; ", "; ", .)
> tempfile tmpfil0
> outsheet using `tmpfil0', nonames noquote
> insheet using `tmpfil0', names delimiter(";") clear
> erase `tmpfil0'
> list, clean
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: