Getting rid of line-breaks in Data

From Eric A. Booth
Re: Getting rid of line-breaks in Data
Date Thu, 18 Jun 2009 19:49:55 -0500


I had a similar issue with an unknown character (it wasn't a was a symbol that looked like a em-dash with a dot over it and, similar to your situation, acted like a end-of-line character for some programs. I used file filter with some of its patterns for EOL characters until one of them knocked it out--solved my issue.

So, you may try all the EOL patterns mentioned in the -filefilter- help file:

filefilter oldfile.txt newfile.txt , from(\n) to(\t)

If "\n" doesnt work, try to substitute it with "\r", "\M", "\W", or "\U" or some ASCII characters (you might want to try the ascii "\254d", see: for more).


On Jun 18, 2009, at 7:32 PM, Matt Spittal wrote:

Dear Elmar,

Carriage returns can be very difficult to deal with. I don't have any clear answers, except to say that I have found a good text editor to be invaluable for cleaning a file. For instance, with my text editor (TextWrangler) I can change between UNIX, Windows and Mac carriage returns and I can use grep functions to find and replace symbols like the carriage return. If you can export your data from Access as a text file (csv) and then clean it within a
text editor, then this might be a good solution.

I am not sure what computer system or text editor you are using at present,
but some very good advice on text editors is given here.

Good luck,

-- Matt

On 18/6/09 5:28 PM, "Elmar Saathoff" wrote:

Dear list members,

I am frequently using data that were imported from PDAs via MsAccess. In some cases these data contain some little squares that do not seem to do
much harm in Stata, but that other applications interpret as
linebreaks/carriage returns/paragraph marks, which is quite a hassle. It seems that these things are inadvertently entered into the PDAs by the
people collecting the data. Unfortunately I cannot show them in this
email, because my email client also interprets them as carriage returns.
Anyway, I have been trying to identify and get rid of these things by
programming (using "subinstr", "egen...split" etc.), but unfortunately, whatever I do, Stata also interprets them as carriage returns, both in
do files and in the command window, even if I change the delimiter to
";" via the delimit command.

Any advice would be greatly appreciated.

Thanks in advance, Elmar
