Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: read text file with multiple spaces


From   Yu Zhang <whgyu1@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: read text file with multiple spaces
Date   Fri, 19 Aug 2005 06:14:39 -0700 (PDT)

Thanks for all the wonderful suggestions!
Unfortunately, since I have multiple data files and
fairly large number of variables per file, I guess I
will stay with my old way.

Yu

--- Daniel Egan <degan.stata@gmail.com> wrote:

> Hi Yu, 
> 
> Perhaps as if not more succint is to open the text
> file in a text
> editor, and replace every instance of "  " (that was
> two spaces) with
> " " (one space).
> 
> You may have to do it until a message comes up that
> - -no instances of "  " (2 spaces) could be found--
> 
> This assumes that you do not have a string variable
> (for example an ID
> variable) that has two spaces within it which are
> meaningful, such as
> "ABCD  EFG HIJK". 
> 
> Thats the only caveat I can think of. 
> 
> Best, 
> 
> Dan
> 
> On 8/19/05, Jayesh Kumar
> <theindianeconomist@gmail.com> wrote:
> > Since you are already working with Perl, you could
> have find an easier way out.
> > In this case, I'll replace spaces with "|", and
> use delim in insheet command.
> > In perl you could say: perl -lane r/ /\|/g
> filename
> > 
> > If you wish to do it mannually: In any text
> processor I'll replace all
> > consecutive spaces with "|" using find-replace
> command, until all
> > consecutive "|" are removed, and then insheet the
> file.
> > 
> > HTH,
> > Jayesh
> > ===================
> > Jayesh Kumar
> > 
> > On 8/19/05, Joseph Coveney
> <jcoveney@bigplanet.com> wrote:
> > > Yu Zhang wrote:
> > >
> > > It's a shame to ask, but does anyone know how to
> read
> > > data (text file) with multiple spaces between
> > > variables?  The number of spaces may vary, so I
> cannot
> > > use:
> > >
> > > . insheet using file, delim(" ")
> > >
> > > The only way I figured out is to count the
> number of
> > > variables first (e.g., using Perl) and then use:
> > >
> > > . infile var1-var# using file
> > >
> > > Is there a more direct way?
> > >
> > >
>
--------------------------------------------------------------------------------
> > >
> > > My guess would be to do the same in Stata as you
> would do in Perl to
> > > identify variables.
> > >
> > > For example, if there is only a single space
> between tokens within any
> > > string
> > > variable, and there are at least two spaces
> (maybe more) between each pair
> > > of variables, then:
> > > 1. insheet into Stata into a single string
> variable (mind the limit for
> > > string variable length),
> > > 2. use Stata's limited regular expressions
> capability to convert multiple
> > > spaces to a convenient delimiter (choose one not
> otherwise present in the
> > > string variables' data),
> > > 3. convert multiple delimiters to single
> delimiters (mind blank cells),
> > > 4. export the delimited dataset as an ASCII
> spreadsheet from Stata (using
> > > the -no quote- option) to a temporary file, and
> then
> > > 5. re-import the delimited spreadsheet into
> Stata.
> > >
> > > Joseph Coveney
> > >
> > > * Creating demonstration spreadsheet
> > > clear
> > > set more off
> > > set obs 3
> > > generate str var1 = "column1  column2   
> column3"
> > > replace var1 = ///
> > >  "This is the first column.  This is the second
> column.    " ///
> > >  + "This is the third column." in 2
> > > replace var1 = ///
> > >  "The first-second is two spaces.  " ///
> > >  + "The second-third is four spaces.    "  in 3
> > > * Check these last lines above--they might have
> line-wrapped
> > > * in the e-mail handler.
> > > outsheet using
> space_delimited_text_spreadsheet.prn, noname noquote
> > > clear
> > > *
> > > * Begin here
> > > *
> > > insheet using
> space_delimited_text_spreadsheet.prn
> > > replace v1 = subinstr(v1, "  ", "; ", .)
> > > replace v1 = subinstr(v1, "; ; ", "; ", .)
> > > tempfile tmpfil0
> > > outsheet using `tmpfil0', nonames noquote
> > > insheet using `tmpfil0', names delimiter(";")
> clear
> > > erase `tmpfil0'
> > > list, clean
> > > exit
> > >
> > >
> > > *
> > > *   For searches and help try:
> > > *  
> http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > >
> > 
> > *
> > *   For searches and help try:
> > *  
> http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> 
> *
> *   For searches and help try:
> *  
> http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 



		
____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index