Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: help on filefilter


From   kturner@stata.com (Kevin Turner)
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: help on filefilter
Date   Mon, 16 May 2005 10:54:24 -0500

Pinaki Mitra (mar1pxm@ups.com) writes:

>I have text files that contain characters which I don't want to keep
>and want those to be replaced by blank spaces. For example, when I do
>filefilter "originalfile.txt" "formatted.txt", from(\Q) to(" ") replace,
>it replaces double quotes by blank spaces and that's what I need.
>
>Now, if I would only like to keep characters between A to Z and 0 to 9
>and replace all other characters by blank spaces, how can I do this?

Ideally, it would be nice if -filefilter- took a range of characters, much 
like regular expressions allow. As it is right now, the solution requires a 
few steps. It is not the only solution, or a particularly efficient one, but
it is an easy solution to implement.

Because we need to be able to replace a single character, the solution is
to call -filefilter- multiple times for every character in the non alphanumeric
range. We can accomplish this using a forvalues loop that repeatedly calls 
-filefilter- with a new character to replace.

Every character has an ASCII (google 'ascii char' for info) value associated 
with it. Here are the ASCII values associated with the alphanumeric ranges you
are interested in:

	Character	ASCII Range
	0-9		48-57	
	A-Z		65-90

Using the 3-digit ASCII code format for -filefilter-, you can replace
any character using the ASCII numeric. Some of the ASCII values are of little 
interest because they represent little used characters or non-printable 
characters. The ranges we are interested in replacing are:
	33-47 
	58-64
	92-255

Construct a forvalues loop to that loops over these ranges, performing
a -filefilter- for each value. The end result should be what you require.

Note that because the ASCII format for -filefilter- requires 3 digits, all
values under 100 will need a leading 0 prepended to them. 

Also do not forget to copy the output file from -filefilter- back over the
input file for each iteration of the loop. 

Here is a quick example:
-----------------------------------

local myfile1 = "in1.txt" 		// original file to modify
local myfile2 = "in2.txt"		// temporary file

// range 33-47
forvalues i = 33(1)47 {
	filefilter `myfile1' `myfile2', from(\\0`i'd) to(" ") replace
	copy `myfile2' `myfile1', replace
}

// range 58-64
forvalues i = 58(1)64 {
	filefilter `myfile1' `myfile2', from(\\0`i'd) to(" ") replace
	copy `myfile2' `myfile1', replace
}

// range 92-99 (stop before 100 because of leading 0 digit)
forvalues i = 92(1)99 {
	filefilter `myfile1' `myfile2', from(\\0`i'd) to(" ") replace
	copy `myfile2' `myfile1', replace
}

// range 100-255 (resume at 100 for lack of leading 0 digit)
forvalues i = 100(1)255 { 
	filefilter `myfile1' `myfile2', from(\\`i'd) to(" ") replace
	copy `myfile2' `myfile1', replace
}
erase `myfile2'
display "all done -- see `myfile1'"
-----------------------------------


Hope this helps,

--Kevin 
kturner@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index