Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Importing subset of a pipe delimited textfile


From   Daniel Feenberg <[email protected]>
To   [email protected]
Subject   Re: st: Importing subset of a pipe delimited textfile
Date   Wed, 17 Oct 2012 07:19:08 -0400 (EDT)


On Wed, 17 Oct 2012, Rob Shaw wrote:

Hi

I have a very large (around 4Gb) text file that has been pipe
delimited. It won't all fit in memory so I want to process it in
parts.

For fixed datasets I would use infile with the in 1/10000000 option
then 10000001/2000000 etc. However, this dataset has been pipe
delimited so I would need to use insheet, but insheet doesn't seem to
permit the "in" option.

Can anyone help please?

I take it that there are commas in the data, so that converting the pipes to something else with filefilter won't work? You could convert the commas to "~"s first? Data already has "~"s? No unused character available at all?

In Unix there is the "split" command, which works on lines. In Windows there are many split commands available, none from MS and mostly splitting on bytes. That would work if your file has fixed record lengths. I see there is "Text-File-Splitter" which seems to work on lines. I haven't used it.

It is a shame that every input command in Stata is lacking useful features that most of the other input commands seem to have. -in-, -if- and -keep- are all things that should be universal.

dan feenberg


Many thanks
Rob
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index