Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: String function headache.


From   Scott Talkington <talkings@gmu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: String function headache.
Date   Mon, 25 Apr 2011 09:08:56 -0400

That is very helpful, thanks. I wasn't sure whether the "#" character was an operator of some kind, and that was the reason I was getting odd results. Apparently it's not, in this case, but it often is. The other thing that always confuses me about these string functions combined with foreach is that I'm never sure where to place the quotes, especially if operators are involved.

--Scott

On 4/25/2011 5:47 AM, Nick Cox wrote:
To expand on this, with problem-solving hints.

Learning software from definitions is like learning mathematics from
definitions. If you know the concept already, or are super-smart, you
can see immediately what is implied. The rest of us need examples.

In my class learning mathematics in secondary [high] school, there was
one guy who always seemed to understand each new mathematical idea
immediately. (He became a mountaineer, but that is a different story:
http://en.wikipedia.org/wiki/Alan_Rouse ). Almost all the rest of us
needed examples. (In fact I now guess that he sometimes played small
psychological games with us, as usually he had read ahead on his own.)

I don't think I've ever used -strmatch()- before answering this
question. I've always used -strpos()- for finding literal matches or
turned to -regex*()-. That just means what it says, but I had to find
out too quite how -strmatch()- works.

In my experience, as in Scott's example, the real problem involves a
dataset I care about with variables. But when I don't understand, I
fire up -display- and play with very simple examples.
I found this.

In looking for a literal character, an pattern expression matches itself,

. di strmatch("2", "2")
1

but matching means matching, not inclusion:

. di strmatch("42", "2")
0

You need the pattern to be big enough

. di strmatch("42", "?2")
1

. di strmatch("42", "*2")
1

. di strmatch("42", "*2*")
1

A silly analogy: will a shirt fit you? If it's too small, the answer
is just a No. If it fits exactly, or it's bigger than you are, the
answer is a Yes, and you then have to decide whether too big is a
problem or not. (No for formal wear, possibly OK if you want something
really loose.) Similarly with -strmatch()- the pattern can be bigger
than you need, but the answer will still be a Yes.

On Mon, Apr 25, 2011 at 9:28 AM, Nick Cox<njcoxstata@gmail.com>  wrote:
If you want to check for occurrence, just use -strpos()- instead. I
often see people on this list struggling with the regex functions or
-strmatch()- when a simpler function will do the job. I have offered a
talk on functions for the London users' meeting and this point is
already one of the slides.

foreach y in # {
forvalues x=1/6 {
          replace mynumber `x'= strpos(mystring`x', "`y'")>  0
}

Otherwise, my understanding is this: a pattern that is just a literal
character will be matched only by strings that are exactly that
character; for almost all matching problems, you must specify * and/or
?. You seem to be expecting -strmatch()- to behave more like
-regexm()-, but they have different jobs.

But as said -strpos()- is easier to figure out.

Nick

On Mon, Apr 25, 2011 at 4:45 AM, Scott Talkington<talkings@gmu.edu>  wrote:
I just can't seem to make this work.  What I want to do is search for any
occurrence of the "#" character in a string variable and set a flag for that
observation.  I'm searching 6 different strings labeled something like
mystring1 mystring2 etc. and the flags are mynumber1 mynumber2 etc..

So my do file:

forvalues x=1/6 {
foreach y in # {
replace mynumber `x'= strmatch(mistring`x', "`y'")
}
}

I just listed one character in the y list above, but in reality I'm not
having a problem with normal strings like "APT" but with wildcards and with
the number sign character itself.

I assumed that placing a "?" character iyn the search string (s2) would
match zero or one characters + the "#" but it seems to be matching all
strings with one character that are either a  number  or a letter.  Huh?

If I include the wildcard (either the asterisk or the question mark)
*anywhere* (either in the "foreach" part of the do file or in the "replace"
command) it just doesn't work the way I expect it to.  There's a difference
between what I get depending on how many quotes I use  and where as well,
but I'm just not getting anything that does what I want it to.  I've even
tried using the backslash character to indicate that I don't want the "#" to
be read as an operator, but I'm not even sure where to put the backslash or
how to arrange the quotation marks.  It's driving me nuts.  There's some
rule here that I'm just not getting.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index