Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Deleting extra blank lines in -cleanlog-


From   Alan Riley <ariley@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Deleting extra blank lines in -cleanlog-
Date   Tue, 19 Feb 2008 12:44:51 -0600

Allan Joseph Medwick (amedwick@gmail.com) asked if there was a
way to delete multiple blank lines from a file:
> Is there a quick way to delete all of the multiple empty lines created
> by -cleanlog-?  I'd like to compress my file, but keep one line
> between any text that is left for readability.

Some others posted suggestions of ways to accomplish this in Microsoft
Word or other software.

Stata has a command -filefilter- which makes it possible to perform
global search-and-replace operations on a file.  For example, if
Allan has a log file named "x.log" created in Windows (where the
end-of-line character combination is \r\n), he could type in Stata

    . filefilter x.log y.log, from(\r\n\r\n\r\n) to(\r\n\r\n)

which would replace every occurrence of three end-of-lines in a
row with two end-of-lines in a row.  Equivalently, he could type

    . filefilter x.log y.log, from(\W\W\W) to(\W\W)

because -filefilter- understands '\W' as a synonym for the Windows
end-of-line character sequence '\r\n'.  Similarly, under Unix,
use '\n' or '\U'.

There are two things that may not be immediately obvious about
this.  The first is that -filefilter- will likely need to be
called multiple times, as changing every three newlines to two
newlines may still result in multiple adjacent empty lines being
left in the file.  Let's assume 'x.log' is the original file, and
let's use y.log and z.log as output files with -filefilter- so that
we don't change the original file:

    . filefilter x.log y.log, from(\W\W\W) to(\W\W)
    . filefilter y.log z.log, from(\W\W\W) to(\W\W)
    . filefilter z.log y.log, from(\W\W\W) to(\W\W) replace
    . filefilter y.log z.log, from(\W\W\W) to(\W\W) replace
    . filefilter z.log y.log, from(\W\W\W) to(\W\W) replace
    ...

The above should continue until no more changes are made.  We could
automate this by checking the return results from -filefilter- to
see if the 'from()' pattern was found.  If it was not, we know there
were no changes made, and thus, no more changes to be made:

    filefilter x.log y.log, from(\W\W\W) to(\W\W)
    local nchanges = r(occurrences)
    while `nchanges' != 0 {
        filefilter y.log z.log, from(\W\W\W) to(\W\W) replace
        filefilter z.log y.log, from(\W\W\W) to(\W\W) replace
        local nchanges = r(occurrences)
    }

(Remember to use \U on a Unix platform.)

After the code above is executed, 'y.log' will contain the desired
file, and z.log can be discarded.  It is possible that the code
above will call -filefilter- one more time than is necessary, but
that doesn't hurt anything.  Not worrying about that allows the
code to be a little bit easier to read.

I mentioned above that two things might not be immediately obvious.
The second of these concerns the number of end-of-line characters
I am searching for and replacing with the from() and to() options.
You might wonder why I specify three EOL characters (\W\W\W) in
-from()- and two EOL characters (\W\W) in -to()-.  You might think
that instead I should look for \W\W and replace it with \W.

Consider the following lines from a file.  I will write EOL everywhere
that the file contains an end-of-line character sequence:

------------------------------------------------------------------
here is a line.  the next two lines are blank in the original file.EOL
EOL
EOL
here is another line.  the next line is blank in the original file.EOL
EOL
this is the last line of the file.EOL
------------------------------------------------------------------

Because there are EOL characters at the end of non-blank lines, if
all adjacent pairs of EOL characters ('\W\W' to -filefilter-) were
replaced with single EOL characters ('\W' to -filefilter-), the file
above would end up looking like

------------------------------------------------------------------
here is a line.  the next two lines are blank in the original file.EOL
here is another line.  the next line is blank in the original file.EOL
this is the last line of the file.EOL
------------------------------------------------------------------

with no blank lines at all.  Since Allan wants to compress multiple
adjacent blank lines down to single blank lines, moving from
'\W\W\W' to '\W\W' is what should be done, resulting in

------------------------------------------------------------------
here is a line.  the next two lines are blank in the original file.EOL
EOL
here is another line.  the next line is blank in the original file.EOL
EOL
this is the last line of the file.EOL
------------------------------------------------------------------



--Alan Riley
(ariley@stata.com)
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index