Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: strings


From   Robert Picard <[email protected]>
To   [email protected]
Subject   Re: st: RE: strings
Date   Thu, 2 Feb 2012 10:28:52 -0500

It's perhaps worth noting that some string functions can work from the
end of the string.

*----------- begin example -------------

clear
input str50 s
 "ABC BLINCAR COMPANY INC"
 "Abc Blincar Company Inc"
 "Abc Blin Inc Company Inc"
 "Abc Blin Co Company Co"
 end

* To match a specific substring at the end

gen hasinc = substr(upper(s),-4,.) == " INC"
clonevar s2 = s
replace s2 = substr(s,1,length(s) - 4) if hasinc

* more generally, to split a string at the last word

gen tail1 = word(s,-1)
gen head1 = substr(s,1,length(s) - length(tail1) - 1)

* this can also be done with regexs

gen head2 = regexs(1) if regexm(s,"(.+) ")
gen tail2 = regexs(1) if regexm(s," ([^ ]+)$")

* or with regexr if you don't care about the tail

gen head3 = regexr(s," [^ ]+$","")

*------------ end example --------------

On Thu, Feb 2, 2012 at 4:56 AM, Nick Cox <[email protected]> wrote:
> replace company = substr(company, 1, length(company) - 4) if
> substr(company, -4, 4) == " INC"
>
> is a better way to remove any training " INC".
>
>
> On Wed, Feb 1, 2012 at 12:23 PM, Nick Cox <[email protected]> wrote:
>
> That's not really a -split- problem. "INC" is not a string separator
> here. I am credited as the original author of -split- so I can tell
> you that it was not designed for this.
>>
>> The easiest recipe (!) I can think of is
>>
>> gen reversed = reverse(company)
>> replace reversed = subinstr(reverse, "CNI ", "", 1) if substr(reversed, 1, 4) == "CNI "
>> replace company = reverse(reversed)
>>
>> That zaps " INC" if and only if it is the last four characters of your variable.
>>
>> The three commands above could be telescoped into one with some loss of clarity.
>>
>> I can believe that this may not delete all you want to delete.
>>
>
>> KOTa
>
>> 1.
>> i am using split command to divide my string variables into parts, is
>> there any way to force the split only by last occurrence of the split
>> sequence?
>>
>> e.g. if strings are like "ABC BLINCAR COMPANY INC" and i want remove
>> the "INC" from all the strings. if i use split, p(INC) i will get "ABC
>> BL" instead of "ABC BLINCAR COMPANY".
>>
>>
>> 2. is there any way to force stata to ignore letters case when
>> comparing strings?
>> e.g. if i merge 2 files by string variable i want that name "ROGER"
>> and name "Roger" would be recognized as the same string
>>
>> NJC>>> In general, you have to clean up inconsistencies before -merge-. -merge- has a difficult enough job as it is!
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index