Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Getting rid of line-breaks in Data


From   "Eric A. Booth" <[email protected]>
To   [email protected]
Subject   Re: st: Getting rid of line-breaks in Data
Date   Thu, 18 Jun 2009 19:49:55 -0500

Elmar:

I had a similar issue with an unknown character (it wasn't a box...it was a symbol that looked like a em-dash with a dot over it and, similar to your situation, acted like a end-of-line character for some programs. I used file filter with some of its patterns for EOL characters until one of them knocked it out--solved my issue.

So, you may try all the EOL patterns mentioned in the -filefilter- help file:


filefilter oldfile.txt newfile.txt , from(\n) to(\t)




If "\n" doesnt work, try to substitute it with "\r", "\M", "\W", or "\U" or some ASCII characters (you might want to try the ascii "\254d", see: http://www.theasciicode.com.ar/ascii-table-codes/ascii-codes-254.html for more).


Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
Fax: +979.845.0249





On Jun 18, 2009, at 7:32 PM, Matt Spittal wrote:

Dear Elmar,

Carriage returns can be very difficult to deal with. I don't have any clear answers, except to say that I have found a good text editor to be invaluable for cleaning a file. For instance, with my text editor (TextWrangler) I can change between UNIX, Windows and Mac carriage returns and I can use grep functions to find and replace symbols like the carriage return. If you can export your data from Access as a text file (csv) and then clean it within a
text editor, then this might be a good solution.

I am not sure what computer system or text editor you are using at present,
but some very good advice on text editors is given here.

   http://fmwww.bc.edu/repec/bocode/t/textEditors.html

Good luck,

-- Matt
[email protected]




On 18/6/09 5:28 PM, "Elmar Saathoff" <[email protected]> wrote:

Dear list members,

I am frequently using data that were imported from PDAs via MsAccess. In some cases these data contain some little squares that do not seem to do
much harm in Stata, but that other applications interpret as
linebreaks/carriage returns/paragraph marks, which is quite a hassle. It seems that these things are inadvertently entered into the PDAs by the
people collecting the data. Unfortunately I cannot show them in this
email, because my email client also interprets them as carriage returns.
Anyway, I have been trying to identify and get rid of these things by
programming (using "subinstr", "egen...split" etc.), but unfortunately, whatever I do, Stata also interprets them as carriage returns, both in
do files and in the command window, even if I change the delimiter to
";" via the delimit command.

Any advice would be greatly appreciated.

Thanks in advance, Elmar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index