Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: String function headache. |

Date |
Mon, 25 Apr 2011 14:58:07 +0100 |

# has a specific meaning in -#delimit- and -#review-. That should not interfere with looking for literal "#" as, at most, you may have a entirely _separate_ command based on of those. Scott, your emails seem prone to colourful exaggeration ("always confuses", "never sure"). Stata expects " " to be used as delimiters for literal strings or string patterns, string variable names to be used when you are referring to string variables, and compound double quotes `" "' to be used as delimiters whenever there are literal " characters in your string. Thus there are three rules, which can be summarized in a sentence. That third rule is really the only one whose precise form is idiosyncratic to Stata. Unfortunately, but necessarily, there has to be a third rule so that Stata can distinguish between literal " characters and string delimiters. The alternative that " could never be used _within_ a string would not be popular either. Nick On Mon, Apr 25, 2011 at 2:08 PM, Scott Talkington <talkings@gmu.edu> wrote: > That is very helpful, thanks. I wasn't sure whether the "#" character was > an operator of some kind, and that was the reason I was getting odd results. > Apparently it's not, in this case, but it often is. The other thing that > always confuses me about these string functions combined with foreach is > that I'm never sure where to place the quotes, especially if operators are > involved. > > --Scott > > On 4/25/2011 5:47 AM, Nick Cox wrote: >> >> To expand on this, with problem-solving hints. >> >> Learning software from definitions is like learning mathematics from >> definitions. If you know the concept already, or are super-smart, you >> can see immediately what is implied. The rest of us need examples. >> >> In my class learning mathematics in secondary [high] school, there was >> one guy who always seemed to understand each new mathematical idea >> immediately. (He became a mountaineer, but that is a different story: >> http://en.wikipedia.org/wiki/Alan_Rouse ). Almost all the rest of us >> needed examples. (In fact I now guess that he sometimes played small >> psychological games with us, as usually he had read ahead on his own.) >> >> I don't think I've ever used -strmatch()- before answering this >> question. I've always used -strpos()- for finding literal matches or >> turned to -regex*()-. That just means what it says, but I had to find >> out too quite how -strmatch()- works. >> >> In my experience, as in Scott's example, the real problem involves a >> dataset I care about with variables. But when I don't understand, I >> fire up -display- and play with very simple examples. >> I found this. >> >> In looking for a literal character, an pattern expression matches itself, >> >> . di strmatch("2", "2") >> 1 >> >> but matching means matching, not inclusion: >> >> . di strmatch("42", "2") >> 0 >> >> You need the pattern to be big enough >> >> . di strmatch("42", "?2") >> 1 >> >> . di strmatch("42", "*2") >> 1 >> >> . di strmatch("42", "*2*") >> 1 >> >> A silly analogy: will a shirt fit you? If it's too small, the answer >> is just a No. If it fits exactly, or it's bigger than you are, the >> answer is a Yes, and you then have to decide whether too big is a >> problem or not. (No for formal wear, possibly OK if you want something >> really loose.) Similarly with -strmatch()- the pattern can be bigger >> than you need, but the answer will still be a Yes. >> >> On Mon, Apr 25, 2011 at 9:28 AM, Nick Cox<njcoxstata@gmail.com> wrote: >>> >>> If you want to check for occurrence, just use -strpos()- instead. I >>> often see people on this list struggling with the regex functions or >>> -strmatch()- when a simpler function will do the job. I have offered a >>> talk on functions for the London users' meeting and this point is >>> already one of the slides. >>> >>> foreach y in # { >>> forvalues x=1/6 { >>> replace mynumber `x'= strpos(mystring`x', "`y'")> 0 >>> } >>> >>> Otherwise, my understanding is this: a pattern that is just a literal >>> character will be matched only by strings that are exactly that >>> character; for almost all matching problems, you must specify * and/or >>> ?. You seem to be expecting -strmatch()- to behave more like >>> -regexm()-, but they have different jobs. >>> >>> But as said -strpos()- is easier to figure out. >>> >>> Nick >>> >>> On Mon, Apr 25, 2011 at 4:45 AM, Scott Talkington<talkings@gmu.edu> >>> wrote: >>>> >>>> I just can't seem to make this work. What I want to do is search for >>>> any >>>> occurrence of the "#" character in a string variable and set a flag for >>>> that >>>> observation. I'm searching 6 different strings labeled something like >>>> mystring1 mystring2 etc. and the flags are mynumber1 mynumber2 etc.. >>>> >>>> So my do file: >>>> >>>> forvalues x=1/6 { >>>> foreach y in # { >>>> replace mynumber `x'= strmatch(mistring`x', "`y'") >>>> } >>>> } >>>> >>>> I just listed one character in the y list above, but in reality I'm not >>>> having a problem with normal strings like "APT" but with wildcards and >>>> with >>>> the number sign character itself. >>>> >>>> I assumed that placing a "?" character iyn the search string (s2) would >>>> match zero or one characters + the "#" but it seems to be matching all >>>> strings with one character that are either a number or a letter. Huh? >>>> >>>> If I include the wildcard (either the asterisk or the question mark) >>>> *anywhere* (either in the "foreach" part of the do file or in the >>>> "replace" >>>> command) it just doesn't work the way I expect it to. There's a >>>> difference >>>> between what I get depending on how many quotes I use and where as >>>> well, >>>> but I'm just not getting anything that does what I want it to. I've >>>> even >>>> tried using the backslash character to indicate that I don't want the >>>> "#" to >>>> be read as an operator, but I'm not even sure where to put the backslash >>>> or >>>> how to arrange the quotation marks. It's driving me nuts. There's some >>>> rule here that I'm just not getting. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: String function headache.***From:*Scott Talkington <talkings@gmu.edu>

**Re: st: String function headache.***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: String function headache.***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: String function headache.***From:*Scott Talkington <talkings@gmu.edu>

- Prev by Date:
**Re: st: Average Multiple Records** - Next by Date:
**Re: st: Average Multiple Records** - Previous by thread:
**Re: st: String function headache.** - Next by thread:
**st: what does "no overidentifiying restriction" result mean for estat overid?** - Index(es):