Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Stripping ASCII characters


From   Sergiy Radyakin <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Stripping ASCII characters
Date   Tue, 25 Feb 2014 11:40:45 -0500

Dear Anthony,

StataCorp designed -filefilter- command to work with both text and
binary files (as the manual declares).
http://www.stata.com/help.cgi?filefilter

It must work with any characters in file as long as you can describe
it in your do-code. I do not confirm the eof defect in -filefilter-
that you imply. A test is here:
do http://radyakin.org/statalist/2014/20140225_1130_filefilter_probe.do

It seems that you forgot the "d" letter in your code, and hence the
filefilter is doing not what you expect.

If you still insist that there is a defect in -filefilter-, kindly
share the isolated sequence of data (a few bytes before and after the
character not replaced).

Processing file byte-by-byte is nevertheless a useful technique in
cases which -filefilter- does not handle, e.g. replacing sequence only
after a particular other sequence (semaphore) was encountered in the
file.

Hope this helps.

Best, Sergiy Radyakin

On Tue, Feb 25, 2014 at 10:55 AM, Thomas, Anthony
<[email protected]> wrote:
> Hi Ronan and Sergiy,
>
> I'm not sure if my response yesterday made it through to the list, I
> got a bounce notification this morning. In any event, thanks for the
> suggestions. Sergiy: perhaps I am not using filefilter correctly, I
> tried the following:
>
>  filefilter "f1.csv" "f2.csv", from(026) to() replace // 026 is ^Z's hex code
>
> filefilter "f1.csv" "f2.csv", from(\255d) to() replace
>
> and
>
> filefilter "f1.csv" "f2.csv", from(^Z) to() replace // which I didn't
> really expect to work
>
> In all three cases, the number of control characters in hexdum f1.csv
> == number of control characters in hexdump f2.csv. I'll give reading
> the file byte-by-byte a try though. And Ronan, thanks for the
> suggestion, I tried using "sed" (a command line text streaming
> utility) which removed some of the "^Z" but not all.
>
> Thanks,
>
> Anthony
>
> On Tue, Feb 25, 2014 at 8:52 AM, Ronan Conroy <[email protected]> wrote:
>>
>> Prof. Ronan Conroy
>> Associate Professor of Biostatistics
>>
>>
>> RCSI Department of Epidemiology and Public Health Medicine
>> Royal College of Surgeons in Ireland
>> Lower Mercer Street, Dublin 2, Ireland
>> T: 01-402-2431
>> E: [email protected]  W: www.rcsi.ie
>>
>> RCSI DEVELOPING HEALTHCARE LEADERS
>> WHO MAKE A DIFFERENCE WORLDWIDE
>> On 2014 Feabh 24, at 21:03, Thomas, Anthony wrote:
>>
>>> When insheeting a csv file using Stata 11 - Unix, Stata aborts with the error:
>>>
>>> too many variables specified
>>> error in line 5000000 of file
>>>
>>> Output of "hexdump" indicated the file contained control characters
>>> (^Z), and was in binary format, when it was expected to be ASCII. I
>>> tried using "filefilter "f1.csv" "f2.csv", from(^Z) to() replace" to
>>> strip the problem characters, but a hexdump on f2.csv indicates the
>>> (^Z) are still present. From what I understand ^Z (sub) is used in
>>> place of a character that cannot be read by Stata, is this the case?
>>> If so, is there any way to strip these characters from my file prior
>>> to import?
>>
>> This is the place where a good text editor comes in handy. Many have a 'strip non-ASCII' command that does what you want.
>>
>> I ended up with 4,500 text files of which about 10% were corrupted. BBEdit (free, lite version=TextWrangler) processed the whole lot in a second or two!
>>
>> r
>>
>> Ronán Conroy
>> [email protected]
>> Associate Professor
>> Division of Population Health Sciences
>> Royal College of Surgeons in Ireland
>> Beaux Lane House
>> Dublin 2
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index