Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: strings
Re: st: RE: strings
Thu, 2 Feb 2012 21:43:07 +0100
thanks Nick, that is how i did it.
Robert, thank for examples, i was looking for something like word(),
but probably missed it in help files
El día 2 de febrero de 2012 16:28, Robert Picard <email@example.com> escribió:
> It's perhaps worth noting that some string functions can work from the
> end of the string.
> *----------- begin example -------------
> input str50 s
> "ABC BLINCAR COMPANY INC"
> "Abc Blincar Company Inc"
> "Abc Blin Inc Company Inc"
> "Abc Blin Co Company Co"
> * To match a specific substring at the end
> gen hasinc = substr(upper(s),-4,.) == " INC"
> clonevar s2 = s
> replace s2 = substr(s,1,length(s) - 4) if hasinc
> * more generally, to split a string at the last word
> gen tail1 = word(s,-1)
> gen head1 = substr(s,1,length(s) - length(tail1) - 1)
> * this can also be done with regexs
> gen head2 = regexs(1) if regexm(s,"(.+) ")
> gen tail2 = regexs(1) if regexm(s," ([^ ]+)$")
> * or with regexr if you don't care about the tail
> gen head3 = regexr(s," [^ ]+$","")
> *------------ end example --------------
> On Thu, Feb 2, 2012 at 4:56 AM, Nick Cox <firstname.lastname@example.org> wrote:
>> replace company = substr(company, 1, length(company) - 4) if
>> substr(company, -4, 4) == " INC"
>> is a better way to remove any training " INC".
>> On Wed, Feb 1, 2012 at 12:23 PM, Nick Cox <email@example.com> wrote:
>> That's not really a -split- problem. "INC" is not a string separator
>> here. I am credited as the original author of -split- so I can tell
>> you that it was not designed for this.
>>> The easiest recipe (!) I can think of is
>>> gen reversed = reverse(company)
>>> replace reversed = subinstr(reverse, "CNI ", "", 1) if substr(reversed, 1, 4) == "CNI "
>>> replace company = reverse(reversed)
>>> That zaps " INC" if and only if it is the last four characters of your variable.
>>> The three commands above could be telescoped into one with some loss of clarity.
>>> I can believe that this may not delete all you want to delete.
>>> i am using split command to divide my string variables into parts, is
>>> there any way to force the split only by last occurrence of the split
>>> e.g. if strings are like "ABC BLINCAR COMPANY INC" and i want remove
>>> the "INC" from all the strings. if i use split, p(INC) i will get "ABC
>>> BL" instead of "ABC BLINCAR COMPANY".
>>> 2. is there any way to force stata to ignore letters case when
>>> comparing strings?
>>> e.g. if i merge 2 files by string variable i want that name "ROGER"
>>> and name "Roger" would be recognized as the same string
>>> NJC>>> In general, you have to clean up inconsistencies before -merge-. -merge- has a difficult enough job as it is!
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: