Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Stripping ASCII characters

From   "Thomas, Anthony" <>
Subject   Re: st: Stripping ASCII characters
Date   Tue, 25 Feb 2014 10:55:46 -0500

Hi Ronan and Sergiy,

I'm not sure if my response yesterday made it through to the list, I
got a bounce notification this morning. In any event, thanks for the
suggestions. Sergiy: perhaps I am not using filefilter correctly, I
tried the following:

 filefilter "f1.csv" "f2.csv", from(026) to() replace // 026 is ^Z's hex code

filefilter "f1.csv" "f2.csv", from(\255d) to() replace


filefilter "f1.csv" "f2.csv", from(^Z) to() replace // which I didn't
really expect to work

In all three cases, the number of control characters in hexdum f1.csv
== number of control characters in hexdump f2.csv. I'll give reading
the file byte-by-byte a try though. And Ronan, thanks for the
suggestion, I tried using "sed" (a command line text streaming
utility) which removed some of the "^Z" but not all.



On Tue, Feb 25, 2014 at 8:52 AM, Ronan Conroy <> wrote:
> Prof. Ronan Conroy
> Associate Professor of Biostatistics
> RCSI Department of Epidemiology and Public Health Medicine
> Royal College of Surgeons in Ireland
> Lower Mercer Street, Dublin 2, Ireland
> T: 01-402-2431
> E:  W:
> On 2014 Feabh 24, at 21:03, Thomas, Anthony wrote:
>> When insheeting a csv file using Stata 11 - Unix, Stata aborts with the error:
>> too many variables specified
>> error in line 5000000 of file
>> Output of "hexdump" indicated the file contained control characters
>> (^Z), and was in binary format, when it was expected to be ASCII. I
>> tried using "filefilter "f1.csv" "f2.csv", from(^Z) to() replace" to
>> strip the problem characters, but a hexdump on f2.csv indicates the
>> (^Z) are still present. From what I understand ^Z (sub) is used in
>> place of a character that cannot be read by Stata, is this the case?
>> If so, is there any way to strip these characters from my file prior
>> to import?
> This is the place where a good text editor comes in handy. Many have a 'strip non-ASCII' command that does what you want.
> I ended up with 4,500 text files of which about 10% were corrupted. BBEdit (free, lite version=TextWrangler) processed the whole lot in a second or two!
> r
> Ronán Conroy
> Associate Professor
> Division of Population Health Sciences
> Royal College of Surgeons in Ireland
> Beaux Lane House
> Dublin 2
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index