Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Robert Picard <picard@netbox.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: strings |
Date | Thu, 2 Feb 2012 10:28:52 -0500 |
It's perhaps worth noting that some string functions can work from the end of the string. *----------- begin example ------------- clear input str50 s "ABC BLINCAR COMPANY INC" "Abc Blincar Company Inc" "Abc Blin Inc Company Inc" "Abc Blin Co Company Co" end * To match a specific substring at the end gen hasinc = substr(upper(s),-4,.) == " INC" clonevar s2 = s replace s2 = substr(s,1,length(s) - 4) if hasinc * more generally, to split a string at the last word gen tail1 = word(s,-1) gen head1 = substr(s,1,length(s) - length(tail1) - 1) * this can also be done with regexs gen head2 = regexs(1) if regexm(s,"(.+) ") gen tail2 = regexs(1) if regexm(s," ([^ ]+)$") * or with regexr if you don't care about the tail gen head3 = regexr(s," [^ ]+$","") *------------ end example -------------- On Thu, Feb 2, 2012 at 4:56 AM, Nick Cox <njcoxstata@gmail.com> wrote: > replace company = substr(company, 1, length(company) - 4) if > substr(company, -4, 4) == " INC" > > is a better way to remove any training " INC". > > > On Wed, Feb 1, 2012 at 12:23 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > > That's not really a -split- problem. "INC" is not a string separator > here. I am credited as the original author of -split- so I can tell > you that it was not designed for this. >> >> The easiest recipe (!) I can think of is >> >> gen reversed = reverse(company) >> replace reversed = subinstr(reverse, "CNI ", "", 1) if substr(reversed, 1, 4) == "CNI " >> replace company = reverse(reversed) >> >> That zaps " INC" if and only if it is the last four characters of your variable. >> >> The three commands above could be telescoped into one with some loss of clarity. >> >> I can believe that this may not delete all you want to delete. >> > >> KOTa > >> 1. >> i am using split command to divide my string variables into parts, is >> there any way to force the split only by last occurrence of the split >> sequence? >> >> e.g. if strings are like "ABC BLINCAR COMPANY INC" and i want remove >> the "INC" from all the strings. if i use split, p(INC) i will get "ABC >> BL" instead of "ABC BLINCAR COMPANY". >> >> >> 2. is there any way to force stata to ignore letters case when >> comparing strings? >> e.g. if i merge 2 files by string variable i want that name "ROGER" >> and name "Roger" would be recognized as the same string >> >> NJC>>> In general, you have to clean up inconsistencies before -merge-. -merge- has a difficult enough job as it is! >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/