Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Importing subset of a pipe delimited textfile |
Date | Wed, 17 Oct 2012 12:37:54 +0100 |
Why is varying length of line a problem? So long as the same variables are represented on each line, I can see no problem. Also, -filefilter- has a tacit loop; you don't need to set it up for yourself. Nick On Wed, Oct 17, 2012 at 12:33 PM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: > Nick > > Thanks. Yes that would work but the problem is the varying length of > each line. So I need to get filefilter or another command to do one > of: > > x=0 > counter=1 > with "myfile.txt" { > y = position of 10000th EOL in `i' > save `i' from position x to y in "myfilepos"+counter+".txt" > x =y > } > > This would create files called myfilepos1, myfilepos2 etc each with > 10000 lines that I could then -insheet- with a delimiter(|) option. > But I don't know how to correctly specify the bit in the loop. > > OR > > for each line in "myfile.txt" { > find | and replace with a number of spaces depending on position in row > } > > This would make each line the same length so I could use -infile- > > Is there a way to use -filefilter- to achieve this? > > File sample: > > 1|ABCD|23|XYZ > 10|BCED|1|YZX > 30|DCHS|234|YBH > .... > > Thanks > Rob > > >>I'd use -filefilter- to change the pipes to something that -infile- can handle. > >>(Strictly, -in- is a qualifier, not an option.) > >>Nick > >>On Wed, Oct 17, 2012 at 9:13 AM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: > >> I have a very large (around 4Gb) text file that has been pipe >> delimited. It won't all fit in memory so I want to process it in >> parts. >> >> For fixed datasets I would use infile with the in 1/10000000 option >> then 10000001/2000000 etc. However, this dataset has been pipe >> delimited so I would need to use insheet, but insheet doesn't seem to >> permit the "in" option. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/