Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: String function headache.


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: String function headache.
Date   Mon, 25 Apr 2011 14:58:07 +0100

# has a specific meaning in -#delimit- and -#review-. That should not
interfere with looking for literal "#" as, at most, you may have a
entirely _separate_ command based on of those.

Scott, your emails seem prone to colourful exaggeration ("always
confuses", "never sure").

Stata expects " " to be used as delimiters for literal strings or
string patterns, string variable names to be used when you are
referring to string variables, and compound double quotes `"  "' to be
used as delimiters whenever there are literal " characters in your
string. Thus there are three rules, which can be summarized in a
sentence. That third rule is really the only one whose precise form is
idiosyncratic to Stata. Unfortunately, but necessarily, there has to
be a third rule so that Stata can distinguish between literal "
characters and string delimiters. The alternative that " could never
be used _within_ a string would not be popular either.

Nick

On Mon, Apr 25, 2011 at 2:08 PM, Scott Talkington <[email protected]> wrote:
> That is very helpful, thanks.  I wasn't sure whether the "#" character was
> an operator of some kind, and that was the reason I was getting odd results.
>  Apparently it's not, in this case, but it often is.  The other thing that
> always confuses me about these string functions combined with foreach is
> that I'm never sure where to place the quotes, especially if operators are
> involved.
>
> --Scott
>
> On 4/25/2011 5:47 AM, Nick Cox wrote:
>>
>> To expand on this, with problem-solving hints.
>>
>> Learning software from definitions is like learning mathematics from
>> definitions. If you know the concept already, or are super-smart, you
>> can see immediately what is implied. The rest of us need examples.
>>
>> In my class learning mathematics in secondary [high] school, there was
>> one guy who always seemed to understand each new mathematical idea
>> immediately. (He became a mountaineer, but that is a different story:
>> http://en.wikipedia.org/wiki/Alan_Rouse ). Almost all the rest of us
>> needed examples. (In fact I now guess that he sometimes played small
>> psychological games with us, as usually he had read ahead on his own.)
>>
>> I don't think I've ever used -strmatch()- before answering this
>> question. I've always used -strpos()- for finding literal matches or
>> turned to -regex*()-. That just means what it says, but I had to find
>> out too quite how -strmatch()- works.
>>
>> In my experience, as in Scott's example, the real problem involves a
>> dataset I care about with variables. But when I don't understand, I
>> fire up -display- and play with very simple examples.
>> I found this.
>>
>> In looking for a literal character, an pattern expression matches itself,
>>
>> . di strmatch("2", "2")
>> 1
>>
>> but matching means matching, not inclusion:
>>
>> . di strmatch("42", "2")
>> 0
>>
>> You need the pattern to be big enough
>>
>> . di strmatch("42", "?2")
>> 1
>>
>> . di strmatch("42", "*2")
>> 1
>>
>> . di strmatch("42", "*2*")
>> 1
>>
>> A silly analogy: will a shirt fit you? If it's too small, the answer
>> is just a No. If it fits exactly, or it's bigger than you are, the
>> answer is a Yes, and you then have to decide whether too big is a
>> problem or not. (No for formal wear, possibly OK if you want something
>> really loose.) Similarly with -strmatch()- the pattern can be bigger
>> than you need, but the answer will still be a Yes.
>>
>> On Mon, Apr 25, 2011 at 9:28 AM, Nick Cox<[email protected]>  wrote:
>>>
>>> If you want to check for occurrence, just use -strpos()- instead. I
>>> often see people on this list struggling with the regex functions or
>>> -strmatch()- when a simpler function will do the job. I have offered a
>>> talk on functions for the London users' meeting and this point is
>>> already one of the slides.
>>>
>>> foreach y in # {
>>> forvalues x=1/6 {
>>>          replace mynumber `x'= strpos(mystring`x', "`y'")>  0
>>> }
>>>
>>> Otherwise, my understanding is this: a pattern that is just a literal
>>> character will be matched only by strings that are exactly that
>>> character; for almost all matching problems, you must specify * and/or
>>> ?. You seem to be expecting -strmatch()- to behave more like
>>> -regexm()-, but they have different jobs.
>>>
>>> But as said -strpos()- is easier to figure out.
>>>
>>> Nick
>>>
>>> On Mon, Apr 25, 2011 at 4:45 AM, Scott Talkington<[email protected]>
>>>  wrote:
>>>>
>>>> I just can't seem to make this work.  What I want to do is search for
>>>> any
>>>> occurrence of the "#" character in a string variable and set a flag for
>>>> that
>>>> observation.  I'm searching 6 different strings labeled something like
>>>> mystring1 mystring2 etc. and the flags are mynumber1 mynumber2 etc..
>>>>
>>>> So my do file:
>>>>
>>>> forvalues x=1/6 {
>>>> foreach y in # {
>>>> replace mynumber `x'= strmatch(mistring`x', "`y'")
>>>> }
>>>> }
>>>>
>>>> I just listed one character in the y list above, but in reality I'm not
>>>> having a problem with normal strings like "APT" but with wildcards and
>>>> with
>>>> the number sign character itself.
>>>>
>>>> I assumed that placing a "?" character iyn the search string (s2) would
>>>> match zero or one characters + the "#" but it seems to be matching all
>>>> strings with one character that are either a  number  or a letter.  Huh?
>>>>
>>>> If I include the wildcard (either the asterisk or the question mark)
>>>> *anywhere* (either in the "foreach" part of the do file or in the
>>>> "replace"
>>>> command) it just doesn't work the way I expect it to.  There's a
>>>> difference
>>>> between what I get depending on how many quotes I use  and where as
>>>> well,
>>>> but I'm just not getting anything that does what I want it to.  I've
>>>> even
>>>> tried using the backslash character to indicate that I don't want the
>>>> "#" to
>>>> be read as an operator, but I'm not even sure where to put the backslash
>>>> or
>>>> how to arrange the quotation marks.  It's driving me nuts.  There's some
>>>> rule here that I'm just not getting.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index