Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: st: Importing subset of a pipe delimited textfile - resolved (almost)

From	Rob Shaw <[email protected]>
To	[email protected]
Subject	Fwd: st: Importing subset of a pipe delimited textfile - resolved (almost)
Date	Wed, 17 Oct 2012 23:04:13 +0100

Hi

Just an update on this, as I've now got it to work thanks to your suggestions.

The code is below.

filefilter myfile.csv `temp1', from("|") to(" ") //this successfully
replaces all the pipes with spaces so infile will work

forvalues counter = 1(1)60 {
 if `counter'==1 {
  local starter=2  // this bit is needed because I want to ignore the
existing variables names in the first row
 }
 else {
  local starter = (`counter'-1)*1000000 +1
 }
 local ender = `counter'*1000000
 display `counter' " " `starter' " " `ender' // this is 1 2 1000000
then 2 1000001 2000000 etc
 infile str7 var1 var2 var2a var2b str9 var3 str9 var4 str9 var5 str9
var6 str9 var7 str9 var8 using `temp1' in `starter'/`ender',clear
 display "after infile"
 save newfile`counter'
}

The file I'm using here is slightly different to the example but the
general format is the same.

This all works fine if I paste it into the command window. For some
reason it doesn't like the infile line if I put it in a do file. It
gives the error

invalid '2'
r(198);

for some reason.

Many thanks again for your help

Rob
---------- Forwarded message ----------
From: Rob Shaw <[email protected]>
Date: 17 October 2012 12:33
Subject: Re: st: Importing subset of a pipe delimited textfile
To: [email protected]


Nick

Thanks. Yes that would work but the problem is the varying length of
each line. So I need to get filefilter or another command to do one
of:

x=0
counter=1
with "myfile.txt" {
 y = position of 10000th EOL in `i'
 save `i' from position x to y in "myfilepos"+counter+".txt"
 x =y
}

This would create files called myfilepos1, myfilepos2 etc each with
10000 lines that I could then -insheet- with a delimiter(|) option.
But I don't know how to correctly specify the bit in the loop.

OR

for each line in "myfile.txt" {
 find | and replace with a number of spaces depending on position in row
}

This would make each line the same length so I could use -infile-

Is there a way to use -filefilter- to achieve this?

File sample:

1|ABCD|23|XYZ
10|BCED|1|YZX
30|DCHS|234|YBH
....

Thanks
Rob


>I'd use -filefilter- to change the pipes to something that -infile- can handle.

>(Strictly, -in- is a qualifier, not an option.)

>Nick

>On Wed, Oct 17, 2012 at 9:13 AM, Rob Shaw <[email protected]> wrote:

> I have a very large (around 4Gb) text file that has been pipe
> delimited. It won't all fit in memory so I want to process it in
> parts.
>
> For fixed datasets I would use infile with the in 1/10000000 option
> then 10000001/2000000 etc. However, this dataset has been pipe
> delimited so I would need to use insheet, but insheet doesn't seem to
> permit the "in" option.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: how to escalate coefficients using outreg2 or a similar program?
Next by Date: Re: st: adjusting hazard ratios in st cox using offset
Previous by thread: st: use xtnbreg, fe or xtpoisson, fe vce(r)?
Next by thread: st: analysing experimental panel data
Index(es):
- Date
- Thread