Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Rob Shaw <rob.shaw.uk@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Fwd: st: Importing subset of a pipe delimited textfile - resolved (almost) |

Date |
Wed, 17 Oct 2012 23:04:13 +0100 |

Hi Just an update on this, as I've now got it to work thanks to your suggestions. The code is below. filefilter myfile.csv `temp1', from("|") to(" ") //this successfully replaces all the pipes with spaces so infile will work forvalues counter = 1(1)60 { if `counter'==1 { local starter=2 // this bit is needed because I want to ignore the existing variables names in the first row } else { local starter = (`counter'-1)*1000000 +1 } local ender = `counter'*1000000 display `counter' " " `starter' " " `ender' // this is 1 2 1000000 then 2 1000001 2000000 etc infile str7 var1 var2 var2a var2b str9 var3 str9 var4 str9 var5 str9 var6 str9 var7 str9 var8 using `temp1' in `starter'/`ender',clear display "after infile" save newfile`counter' } The file I'm using here is slightly different to the example but the general format is the same. This all works fine if I paste it into the command window. For some reason it doesn't like the infile line if I put it in a do file. It gives the error invalid '2' r(198); for some reason. Many thanks again for your help Rob ---------- Forwarded message ---------- From: Rob Shaw <rob.shaw.uk@gmail.com> Date: 17 October 2012 12:33 Subject: Re: st: Importing subset of a pipe delimited textfile To: statalist@hsphsun2.harvard.edu Nick Thanks. Yes that would work but the problem is the varying length of each line. So I need to get filefilter or another command to do one of: x=0 counter=1 with "myfile.txt" { y = position of 10000th EOL in `i' save `i' from position x to y in "myfilepos"+counter+".txt" x =y } This would create files called myfilepos1, myfilepos2 etc each with 10000 lines that I could then -insheet- with a delimiter(|) option. But I don't know how to correctly specify the bit in the loop. OR for each line in "myfile.txt" { find | and replace with a number of spaces depending on position in row } This would make each line the same length so I could use -infile- Is there a way to use -filefilter- to achieve this? File sample: 1|ABCD|23|XYZ 10|BCED|1|YZX 30|DCHS|234|YBH .... Thanks Rob >I'd use -filefilter- to change the pipes to something that -infile- can handle. >(Strictly, -in- is a qualifier, not an option.) >Nick >On Wed, Oct 17, 2012 at 9:13 AM, Rob Shaw <rob.shaw.uk@gmail.com> wrote: > I have a very large (around 4Gb) text file that has been pipe > delimited. It won't all fit in memory so I want to process it in > parts. > > For fixed datasets I would use infile with the in 1/10000000 option > then 10000001/2000000 etc. However, this dataset has been pipe > delimited so I would need to use insheet, but insheet doesn't seem to > permit the "in" option. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: how to escalate coefficients using outreg2 or a similar program?** - Next by Date:
**Re: st: adjusting hazard ratios in st cox using offset** - Previous by thread:
**st: use xtnbreg, fe or xtpoisson, fe vce(r)?** - Next by thread:
**st: analysing experimental panel data** - Index(es):