Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Thomas, Anthony" <anthony_h_thomas@brown.edu> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Stripping ASCII characters |
Date | Tue, 25 Feb 2014 10:55:46 -0500 |
Hi Ronan and Sergiy, I'm not sure if my response yesterday made it through to the list, I got a bounce notification this morning. In any event, thanks for the suggestions. Sergiy: perhaps I am not using filefilter correctly, I tried the following: filefilter "f1.csv" "f2.csv", from(026) to() replace // 026 is ^Z's hex code filefilter "f1.csv" "f2.csv", from(\255d) to() replace and filefilter "f1.csv" "f2.csv", from(^Z) to() replace // which I didn't really expect to work In all three cases, the number of control characters in hexdum f1.csv == number of control characters in hexdump f2.csv. I'll give reading the file byte-by-byte a try though. And Ronan, thanks for the suggestion, I tried using "sed" (a command line text streaming utility) which removed some of the "^Z" but not all. Thanks, Anthony On Tue, Feb 25, 2014 at 8:52 AM, Ronan Conroy <rconroy@rcsi.ie> wrote: > > Prof. Ronan Conroy > Associate Professor of Biostatistics > > > RCSI Department of Epidemiology and Public Health Medicine > Royal College of Surgeons in Ireland > Lower Mercer Street, Dublin 2, Ireland > T: 01-402-2431 > E: rconroy@rcsi.ie W: www.rcsi.ie > > RCSI DEVELOPING HEALTHCARE LEADERS > WHO MAKE A DIFFERENCE WORLDWIDE > On 2014 Feabh 24, at 21:03, Thomas, Anthony wrote: > >> When insheeting a csv file using Stata 11 - Unix, Stata aborts with the error: >> >> too many variables specified >> error in line 5000000 of file >> >> Output of "hexdump" indicated the file contained control characters >> (^Z), and was in binary format, when it was expected to be ASCII. I >> tried using "filefilter "f1.csv" "f2.csv", from(^Z) to() replace" to >> strip the problem characters, but a hexdump on f2.csv indicates the >> (^Z) are still present. From what I understand ^Z (sub) is used in >> place of a character that cannot be read by Stata, is this the case? >> If so, is there any way to strip these characters from my file prior >> to import? > > This is the place where a good text editor comes in handy. Many have a 'strip non-ASCII' command that does what you want. > > I ended up with 4,500 text files of which about 10% were corrupted. BBEdit (free, lite version=TextWrangler) processed the whole lot in a second or two! > > r > > Ronán Conroy > rconroy@rcsi.ie > Associate Professor > Division of Population Health Sciences > Royal College of Surgeons in Ireland > Beaux Lane House > Dublin 2 > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/