Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: String function headache.

From   Nick Cox <>
Subject   Re: st: String function headache.
Date   Mon, 25 Apr 2011 10:47:48 +0100

To expand on this, with problem-solving hints.

Learning software from definitions is like learning mathematics from
definitions. If you know the concept already, or are super-smart, you
can see immediately what is implied. The rest of us need examples.

In my class learning mathematics in secondary [high] school, there was
one guy who always seemed to understand each new mathematical idea
immediately. (He became a mountaineer, but that is a different story: ). Almost all the rest of us
needed examples. (In fact I now guess that he sometimes played small
psychological games with us, as usually he had read ahead on his own.)

I don't think I've ever used -strmatch()- before answering this
question. I've always used -strpos()- for finding literal matches or
turned to -regex*()-. That just means what it says, but I had to find
out too quite how -strmatch()- works.

In my experience, as in Scott's example, the real problem involves a
dataset I care about with variables. But when I don't understand, I
fire up -display- and play with very simple examples.
I found this.

In looking for a literal character, an pattern expression matches itself,

. di strmatch("2", "2")

but matching means matching, not inclusion:

. di strmatch("42", "2")

You need the pattern to be big enough

. di strmatch("42", "?2")

. di strmatch("42", "*2")

. di strmatch("42", "*2*")

A silly analogy: will a shirt fit you? If it's too small, the answer
is just a No. If it fits exactly, or it's bigger than you are, the
answer is a Yes, and you then have to decide whether too big is a
problem or not. (No for formal wear, possibly OK if you want something
really loose.) Similarly with -strmatch()- the pattern can be bigger
than you need, but the answer will still be a Yes.

On Mon, Apr 25, 2011 at 9:28 AM, Nick Cox <> wrote:
> If you want to check for occurrence, just use -strpos()- instead. I
> often see people on this list struggling with the regex functions or
> -strmatch()- when a simpler function will do the job. I have offered a
> talk on functions for the London users' meeting and this point is
> already one of the slides.
> foreach y in # {
> forvalues x=1/6 {
>          replace mynumber `x'= strpos(mystring`x', "`y'") > 0
> }
> Otherwise, my understanding is this: a pattern that is just a literal
> character will be matched only by strings that are exactly that
> character; for almost all matching problems, you must specify * and/or
> ?. You seem to be expecting -strmatch()- to behave more like
> -regexm()-, but they have different jobs.
> But as said -strpos()- is easier to figure out.
> Nick
> On Mon, Apr 25, 2011 at 4:45 AM, Scott Talkington <> wrote:
>> I just can't seem to make this work.  What I want to do is search for any
>> occurrence of the "#" character in a string variable and set a flag for that
>> observation.  I'm searching 6 different strings labeled something like
>> mystring1 mystring2 etc. and the flags are mynumber1 mynumber2 etc..
>> So my do file:
>> forvalues x=1/6 {
>> foreach y in # {
>> replace mynumber `x'= strmatch(mistring`x', "`y'")
>> }
>> }
>> I just listed one character in the y list above, but in reality I'm not
>> having a problem with normal strings like "APT" but with wildcards and with
>> the number sign character itself.
>> I assumed that placing a "?" character iyn the search string (s2) would
>> match zero or one characters + the "#" but it seems to be matching all
>> strings with one character that are either a  number  or a letter.  Huh?
>> If I include the wildcard (either the asterisk or the question mark)
>> *anywhere* (either in the "foreach" part of the do file or in the "replace"
>> command) it just doesn't work the way I expect it to.  There's a difference
>> between what I get depending on how many quotes I use  and where as well,
>> but I'm just not getting anything that does what I want it to.  I've even
>> tried using the backslash character to indicate that I don't want the "#" to
>> be read as an operator, but I'm not even sure where to put the backslash or
>> how to arrange the quotation marks.  It's driving me nuts.  There's some
>> rule here that I'm just not getting.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index