Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: strings
Robert Picard <firstname.lastname@example.org>
Re: st: RE: strings
Thu, 2 Feb 2012 10:28:52 -0500
It's perhaps worth noting that some string functions can work from the
end of the string.
*----------- begin example -------------
input str50 s
"ABC BLINCAR COMPANY INC"
"Abc Blincar Company Inc"
"Abc Blin Inc Company Inc"
"Abc Blin Co Company Co"
* To match a specific substring at the end
gen hasinc = substr(upper(s),-4,.) == " INC"
clonevar s2 = s
replace s2 = substr(s,1,length(s) - 4) if hasinc
* more generally, to split a string at the last word
gen tail1 = word(s,-1)
gen head1 = substr(s,1,length(s) - length(tail1) - 1)
* this can also be done with regexs
gen head2 = regexs(1) if regexm(s,"(.+) ")
gen tail2 = regexs(1) if regexm(s," ([^ ]+)$")
* or with regexr if you don't care about the tail
gen head3 = regexr(s," [^ ]+$","")
*------------ end example --------------
On Thu, Feb 2, 2012 at 4:56 AM, Nick Cox <email@example.com> wrote:
> replace company = substr(company, 1, length(company) - 4) if
> substr(company, -4, 4) == " INC"
> is a better way to remove any training " INC".
> On Wed, Feb 1, 2012 at 12:23 PM, Nick Cox <firstname.lastname@example.org> wrote:
> That's not really a -split- problem. "INC" is not a string separator
> here. I am credited as the original author of -split- so I can tell
> you that it was not designed for this.
>> The easiest recipe (!) I can think of is
>> gen reversed = reverse(company)
>> replace reversed = subinstr(reverse, "CNI ", "", 1) if substr(reversed, 1, 4) == "CNI "
>> replace company = reverse(reversed)
>> That zaps " INC" if and only if it is the last four characters of your variable.
>> The three commands above could be telescoped into one with some loss of clarity.
>> I can believe that this may not delete all you want to delete.
>> i am using split command to divide my string variables into parts, is
>> there any way to force the split only by last occurrence of the split
>> e.g. if strings are like "ABC BLINCAR COMPANY INC" and i want remove
>> the "INC" from all the strings. if i use split, p(INC) i will get "ABC
>> BL" instead of "ABC BLINCAR COMPANY".
>> 2. is there any way to force stata to ignore letters case when
>> comparing strings?
>> e.g. if i merge 2 files by string variable i want that name "ROGER"
>> and name "Roger" would be recognized as the same string
>> NJC>>> In general, you have to clean up inconsistencies before -merge-. -merge- has a difficult enough job as it is!
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: