Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Importing subset of a pipe delimited textfile |

Date |
Wed, 17 Oct 2012 13:50:41 +0200 |

To give a concrete example: I stored Rob's example dataset in foo.raw I than typed in Stata: filefilter foo.raw foo2.raw, from("|") to(\t) replace insheet using foo2.raw The first line replaced all pipes in the file foo.raw with a tab and stored the resulting tab-delimited file in foo2.raw, and the second line read this tab-delimited file foo2.raw into Stata. Hope this helps, Maarten On Wed, Oct 17, 2012 at 1:37 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Why is varying length of line a problem? So long as the same variables > are represented on each line, I can see no problem. > > Also, -filefilter- has a tacit loop; you don't need to set it up for yourself. > > Nick > > On Wed, Oct 17, 2012 at 12:33 PM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: >> Nick >> >> Thanks. Yes that would work but the problem is the varying length of >> each line. So I need to get filefilter or another command to do one >> of: >> >> x=0 >> counter=1 >> with "myfile.txt" { >> y = position of 10000th EOL in `i' >> save `i' from position x to y in "myfilepos"+counter+".txt" >> x =y >> } >> >> This would create files called myfilepos1, myfilepos2 etc each with >> 10000 lines that I could then -insheet- with a delimiter(|) option. >> But I don't know how to correctly specify the bit in the loop. >> >> OR >> >> for each line in "myfile.txt" { >> find | and replace with a number of spaces depending on position in row >> } >> >> This would make each line the same length so I could use -infile- >> >> Is there a way to use -filefilter- to achieve this? >> >> File sample: >> >> 1|ABCD|23|XYZ >> 10|BCED|1|YZX >> 30|DCHS|234|YBH >> .... >> >> Thanks >> Rob >> >> >>>I'd use -filefilter- to change the pipes to something that -infile- can handle. >> >>>(Strictly, -in- is a qualifier, not an option.) >> >>>Nick >> >>>On Wed, Oct 17, 2012 at 9:13 AM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: >> >>> I have a very large (around 4Gb) text file that has been pipe >>> delimited. It won't all fit in memory so I want to process it in >>> parts. >>> >>> For fixed datasets I would use infile with the in 1/10000000 option >>> then 10000001/2000000 etc. However, this dataset has been pipe >>> delimited so I would need to use insheet, but insheet doesn't seem to >>> permit the "in" option. >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany http://www.maartenbuis.nl --------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Importing subset of a pipe delimited textfile***From:*Maarten Buis <maartenlbuis@gmail.com>

**References**:**Re: st: Importing subset of a pipe delimited textfile***From:*Rob Shaw <rob.shaw.uk@gmail.com>

**Re: st: Importing subset of a pipe delimited textfile***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Importing subset of a pipe delimited textfile** - Next by Date:
**Re: st: Importing subset of a pipe delimited textfile** - Previous by thread:
**Re: st: Importing subset of a pipe delimited textfile** - Next by thread:
**Re: st: Importing subset of a pipe delimited textfile** - Index(es):