Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problem parsing strings that contain "$CHAR"


From   vwiggins@stata.com (Vince Wiggins, StataCorp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Problem parsing strings that contain "$CHAR"
Date   Wed, 21 Apr 2010 10:54:27 -0500

Benoit-Paul Hebert <benoit.hebert@hrsdc-rhdcc.gc.ca> asks about processing
lines that contain dollar signs -- "$".

> I have to parse the lines of SAS syntax files that contain 1) the
> position of a variable in a raw data file, 2) the name of the
> variable, and 3) the data type of the variable. Examples of such
> lines are:
>
> @04075    Variable_B    $5.
> @04080    Variable_C    $CHAR6.
>
> Both "$#" and "$CHAR#" mean that the variable is a string variable
> of # characters. I have no problem parsing the first line above, but
> am looking for advice on what to do with lines that contain a
> sequence like "$CHAR6". Such sequences appear to be evaluated first
> and before the string is parsed, and results in "", as in below:
>
> . loc line @04080    Variable_C    $CHAR6.
> . di "`line'"
> @04080    Variable_C    .
>
> Commands like regexm and tokenize or functions like subinstr also
> see "."  instead of "$CHAR6.", making it impossible (for me at
> least) to retrieve all the relevant information in these lines.

What is happening
-----------------

Benoit-Paul is using standard macro expansion when processing the lines in his
file.  That means that Stata is doing what it always does with macro
expansion, which is to resubstitute all macros recursively until there are no
macros left to expand.  An example will help.

   Assume we have used the -file- command to read a line from a
   file into the local macro -myline- and that line in the file
   is.

      "Hello $CHAR6."

   If we use standard local macro expansion with `myline', e.g., 

      . display `"`myline'"'

   Stata first substitutes the original string for `myline'

      . display `"Hello $CHAR6."'

   Then Stata sees what looks like a global macro -- $CHAR6 --
   and expands that too.  Since we have not defined $CHAR6, that
   just expands to nothing -- "",

      . display `"Hello ."'

   And, we see in our Results Window.

       Hello .


Solution using Stata
--------------------

We can tell Stata to expand a macro only once and NOT recursively resubstitute
macros in the result.  We do that with the -macval()- directive (or pseudo
function, if you prefer).  For example,

      . display `"`macval(myline)'"'

   displays

      Hello $CHAR6.

And, that is really all there is too it.


Sidebar on solution using Stata
-------------------------------

We must also be careful when creating macros using literal strings that
contain what look like macro references.  This will work

       . local a `"`macval(myline)'"'

and the local macro -a- will contain "Hello $CHAR6."  This works because we
did not type a literal string to be assigned to -a-.

If, however, we type

       . local a `"Hello $CHAR6."'

then -a- will contain "Hello ." even before we try to evaluate it.  The macro
substitution for $CHAR6 occurred when -local a `"Hello $CHAR6."'- was run.
When creating macros from literal strings, we need to escape anything in the
string that looks like a macro,

       . local a `"Hello \$CHAR6."'

We will still need to use `macval(a)' when dereferencing -a-.


Solution using Mata
-------------------

Mata is not a macro language like Stata, but a byte-compiled language.  It
treats all letters in a string as just plain letters with no special meaning,
such as macro expansion.

For that reason alone it is a better general-purpose parsing engine than
Stata.  Stata understands Stata and most aspects of its language are built to
make that easy.  That makes Stata great for understanding Stata, but can cause
problems when processing strings that it thinks it should understand (but you
don't want it to).  Mata does not have that problem (virtue?).

Mata has its own host of I/O (file) functions, string functions, and parsing
functions, see -help m4 io- and -help m4 string-.  

Beyond the documentation, there is a thorough overview of file and string
processing using Mata in one of Bill Gould's Mata Matters columns,

    Gould, William. 2009. Mata Matters: File processing. 
           Stata Journal 9: 599-620.

The article is not yet outside the Journal's moving wall that will eventually
make it free.


-- Vince 
   vwiggins@stata.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index