Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Importing subset of a pipe delimited textfile


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Importing subset of a pipe delimited textfile
Date   Wed, 17 Oct 2012 12:37:54 +0100

Why is varying length of line a problem? So long as the same variables
are represented on each line, I can see no problem.

Also, -filefilter- has a tacit loop; you don't need to set it up for yourself.

Nick

On Wed, Oct 17, 2012 at 12:33 PM, Rob Shaw <rob.shaw.uk@gmail.com> wrote:
> Nick
>
> Thanks. Yes that would work but the problem is the varying length of
> each line. So I need to get filefilter or another command to do one
> of:
>
> x=0
> counter=1
> with "myfile.txt" {
>  y = position of 10000th EOL in `i'
>  save `i' from position x to y in "myfilepos"+counter+".txt"
>  x =y
> }
>
> This would create files called myfilepos1, myfilepos2 etc each with
> 10000 lines that I could then -insheet- with a delimiter(|) option.
> But I don't know how to correctly specify the bit in the loop.
>
> OR
>
> for each line in "myfile.txt" {
>  find | and replace with a number of spaces depending on position in row
> }
>
> This would make each line the same length so I could use -infile-
>
> Is there a way to use -filefilter- to achieve this?
>
> File sample:
>
> 1|ABCD|23|XYZ
> 10|BCED|1|YZX
> 30|DCHS|234|YBH
> ....
>
> Thanks
> Rob
>
>
>>I'd use -filefilter- to change the pipes to something that -infile- can handle.
>
>>(Strictly, -in- is a qualifier, not an option.)
>
>>Nick
>
>>On Wed, Oct 17, 2012 at 9:13 AM, Rob Shaw <rob.shaw.uk@gmail.com> wrote:
>
>> I have a very large (around 4Gb) text file that has been pipe
>> delimited. It won't all fit in memory so I want to process it in
>> parts.
>>
>> For fixed datasets I would use infile with the in 1/10000000 option
>> then 10000001/2000000 etc. However, this dataset has been pipe
>> delimited so I would need to use insheet, but insheet doesn't seem to
>> permit the "in" option.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index