Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Rob Shaw <rob.shaw.uk@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Importing subset of a pipe delimited textfile |

Date |
Wed, 17 Oct 2012 13:10:51 +0100 |

Maarten The problem is not the pipes as such (otherwise I could just use the delimiter options in -insheet-), it's that the file is too large to use -insheet- So i need to use -infile- to import my file in separate parts, but infile will only accept fixed format files (as far as I understand). Therefore, if I import my file using: infile str2 var1 _skip(1) str4 var2 _skip(1) str3 var3 _skip(1) str4 var4 using myfile in 1/1000000 I get nonesense because the first record then gets filled with [1|, BCD|, 3|X, YZ] Rob Maarten wrote: To give a concrete example: I stored Rob's example dataset in foo.raw I than typed in Stata: filefilter foo.raw foo2.raw, from("|") to(\t) replace insheet using foo2.raw The first line replaced all pipes in the file foo.raw with a tab and stored the resulting tab-delimited file in foo2.raw, and the second line read this tab-delimited file foo2.raw into Stata. Hope this helps, Maarten On Wed, Oct 17, 2012 at 1:37 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Why is varying length of line a problem? So long as the same variables > are represented on each line, I can see no problem. > > Also, -filefilter- has a tacit loop; you don't need to set it up for yourself. > > Nick > > On Wed, Oct 17, 2012 at 12:33 PM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: >> Nick >> >> Thanks. Yes that would work but the problem is the varying length of >> each line. So I need to get filefilter or another command to do one >> of: >> >> x=0 >> counter=1 >> with "myfile.txt" { >> y = position of 10000th EOL in `i' >> save `i' from position x to y in "myfilepos"+counter+".txt" >> x =y >> } >> >> This would create files called myfilepos1, myfilepos2 etc each with >> 10000 lines that I could then -insheet- with a delimiter(|) option. >> But I don't know how to correctly specify the bit in the loop. >> >> OR >> >> for each line in "myfile.txt" { >> find | and replace with a number of spaces depending on position in row >> } >> >> This would make each line the same length so I could use -infile- >> >> Is there a way to use -filefilter- to achieve this? >> >> File sample: >> >> 1|ABCD|23|XYZ >> 10|BCED|1|YZX >> 30|DCHS|234|YBH >> .... >> >> Thanks >> Rob >> >> >>>I'd use -filefilter- to change the pipes to something that -infile- can handle. >> >>>(Strictly, -in- is a qualifier, not an option.) >> >>>Nick >> >>>On Wed, Oct 17, 2012 at 9:13 AM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: >> >>> I have a very large (around 4Gb) text file that has been pipe >>> delimited. It won't all fit in memory so I want to process it in >>> parts. >>> >>> For fixed datasets I would use infile with the in 1/10000000 option >>> then 10000001/2000000 etc. However, this dataset has been pipe >>> delimited so I would need to use insheet, but insheet doesn't seem to >>> permit the "in" option. >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Importing subset of a pipe delimited textfile***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**Re: st: Importing subset of a pipe delimited textfile** - Next by Date:
**Re: st: Importing subset of a pipe delimited textfile** - Previous by thread:
**Re: st: Importing subset of a pipe delimited textfile** - Next by thread:
**Re: st: Importing subset of a pipe delimited textfile** - Index(es):