Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: String function headache.


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: String function headache.
Date   Mon, 25 Apr 2011 10:47:48 +0100

To expand on this, with problem-solving hints.

Learning software from definitions is like learning mathematics from
definitions. If you know the concept already, or are super-smart, you
can see immediately what is implied. The rest of us need examples.

In my class learning mathematics in secondary [high] school, there was
one guy who always seemed to understand each new mathematical idea
immediately. (He became a mountaineer, but that is a different story:
http://en.wikipedia.org/wiki/Alan_Rouse ). Almost all the rest of us
needed examples. (In fact I now guess that he sometimes played small
psychological games with us, as usually he had read ahead on his own.)

I don't think I've ever used -strmatch()- before answering this
question. I've always used -strpos()- for finding literal matches or
turned to -regex*()-. That just means what it says, but I had to find
out too quite how -strmatch()- works.

In my experience, as in Scott's example, the real problem involves a
dataset I care about with variables. But when I don't understand, I
fire up -display- and play with very simple examples.
I found this.

In looking for a literal character, an pattern expression matches itself,

. di strmatch("2", "2")
1

but matching means matching, not inclusion:

. di strmatch("42", "2")
0

You need the pattern to be big enough

. di strmatch("42", "?2")
1

. di strmatch("42", "*2")
1

. di strmatch("42", "*2*")
1

A silly analogy: will a shirt fit you? If it's too small, the answer
is just a No. If it fits exactly, or it's bigger than you are, the
answer is a Yes, and you then have to decide whether too big is a
problem or not. (No for formal wear, possibly OK if you want something
really loose.) Similarly with -strmatch()- the pattern can be bigger
than you need, but the answer will still be a Yes.

On Mon, Apr 25, 2011 at 9:28 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> If you want to check for occurrence, just use -strpos()- instead. I
> often see people on this list struggling with the regex functions or
> -strmatch()- when a simpler function will do the job. I have offered a
> talk on functions for the London users' meeting and this point is
> already one of the slides.
>
> foreach y in # {
> forvalues x=1/6 {
>          replace mynumber `x'= strpos(mystring`x', "`y'") > 0
> }
>
> Otherwise, my understanding is this: a pattern that is just a literal
> character will be matched only by strings that are exactly that
> character; for almost all matching problems, you must specify * and/or
> ?. You seem to be expecting -strmatch()- to behave more like
> -regexm()-, but they have different jobs.
>
> But as said -strpos()- is easier to figure out.
>
> Nick
>
> On Mon, Apr 25, 2011 at 4:45 AM, Scott Talkington <talkings@gmu.edu> wrote:
>> I just can't seem to make this work.  What I want to do is search for any
>> occurrence of the "#" character in a string variable and set a flag for that
>> observation.  I'm searching 6 different strings labeled something like
>> mystring1 mystring2 etc. and the flags are mynumber1 mynumber2 etc..
>>
>> So my do file:
>>
>> forvalues x=1/6 {
>> foreach y in # {
>> replace mynumber `x'= strmatch(mistring`x', "`y'")
>> }
>> }
>>
>> I just listed one character in the y list above, but in reality I'm not
>> having a problem with normal strings like "APT" but with wildcards and with
>> the number sign character itself.
>>
>> I assumed that placing a "?" character iyn the search string (s2) would
>> match zero or one characters + the "#" but it seems to be matching all
>> strings with one character that are either a  number  or a letter.  Huh?
>>
>> If I include the wildcard (either the asterisk or the question mark)
>> *anywhere* (either in the "foreach" part of the do file or in the "replace"
>> command) it just doesn't work the way I expect it to.  There's a difference
>> between what I get depending on how many quotes I use  and where as well,
>> but I'm just not getting anything that does what I want it to.  I've even
>> tried using the backslash character to indicate that I don't want the "#" to
>> be read as an operator, but I'm not even sure where to put the backslash or
>> how to arrange the quotation marks.  It's driving me nuts.  There's some
>> rule here that I'm just not getting.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index