Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Importing subset of a pipe delimited textfile |

Date |
Wed, 17 Oct 2012 14:04:17 +0200 |

I noticed that you did not want to use -insheet- as it does not allow the -in- qualifier because your data is too big. You can handle big data in chunks in -insheet- by specifying variables, and when necessary merge the sub-files later after making the sub-files smaller by getting rid of the strings with -encode- followed by -compress-. --Maarten On Wed, Oct 17, 2012 at 1:50 PM, Maarten Buis <maartenlbuis@gmail.com> wrote: > To give a concrete example: I stored Rob's example dataset in foo.raw > > I than typed in Stata: > > filefilter foo.raw foo2.raw, from("|") to(\t) replace > > insheet using foo2.raw > > The first line replaced all pipes in the file foo.raw with a tab and > stored the resulting tab-delimited file in foo2.raw, and the second > line read this tab-delimited file foo2.raw into Stata. > > Hope this helps, > Maarten > > On Wed, Oct 17, 2012 at 1:37 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> Why is varying length of line a problem? So long as the same variables >> are represented on each line, I can see no problem. >> >> Also, -filefilter- has a tacit loop; you don't need to set it up for yourself. >> >> Nick >> >> On Wed, Oct 17, 2012 at 12:33 PM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: >>> Nick >>> >>> Thanks. Yes that would work but the problem is the varying length of >>> each line. So I need to get filefilter or another command to do one >>> of: >>> >>> x=0 >>> counter=1 >>> with "myfile.txt" { >>> y = position of 10000th EOL in `i' >>> save `i' from position x to y in "myfilepos"+counter+".txt" >>> x =y >>> } >>> >>> This would create files called myfilepos1, myfilepos2 etc each with >>> 10000 lines that I could then -insheet- with a delimiter(|) option. >>> But I don't know how to correctly specify the bit in the loop. >>> >>> OR >>> >>> for each line in "myfile.txt" { >>> find | and replace with a number of spaces depending on position in row >>> } >>> >>> This would make each line the same length so I could use -infile- >>> >>> Is there a way to use -filefilter- to achieve this? >>> >>> File sample: >>> >>> 1|ABCD|23|XYZ >>> 10|BCED|1|YZX >>> 30|DCHS|234|YBH >>> .... >>> >>> Thanks >>> Rob >>> >>> >>>>I'd use -filefilter- to change the pipes to something that -infile- can handle. >>> >>>>(Strictly, -in- is a qualifier, not an option.) >>> >>>>Nick >>> >>>>On Wed, Oct 17, 2012 at 9:13 AM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: >>> >>>> I have a very large (around 4Gb) text file that has been pipe >>>> delimited. It won't all fit in memory so I want to process it in >>>> parts. >>>> >>>> For fixed datasets I would use infile with the in 1/10000000 option >>>> then 10000001/2000000 etc. However, this dataset has been pipe >>>> delimited so I would need to use insheet, but insheet doesn't seem to >>>> permit the "in" option. >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > > > -- > --------------------------------- > Maarten L. Buis > WZB > Reichpietschufer 50 > 10785 Berlin > Germany > > http://www.maartenbuis.nl > --------------------------------- -- --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany http://www.maartenbuis.nl --------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Importing subset of a pipe delimited textfile***From:*Rob Shaw <rob.shaw.uk@gmail.com>

**Re: st: Importing subset of a pipe delimited textfile***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Importing subset of a pipe delimited textfile***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**Re: st: Importing subset of a pipe delimited textfile** - Next by Date:
**Re: st: Importing subset of a pipe delimited textfile** - Previous by thread:
**Re: st: Importing subset of a pipe delimited textfile** - Next by thread:
**Re: st: Importing subset of a pipe delimited textfile** - Index(es):