From
KOTa <kota.alba@gmail.com>

To
statalist@hsphsun2.harvard.edu

Subject
Re: st: RE: strings

Date
Thu, 2 Feb 2012 21:43:07 +0100

thanks Nick, that is how i did it. Robert, thank for examples, i was looking for something like word(), but probably missed it in help files El día 2 de febrero de 2012 16:28, Robert Picard <picard@netbox.com> escribió: > It's perhaps worth noting that some string functions can work from the > end of the string. > > *----------- begin example ------------- > > clear > input str50 s > "ABC BLINCAR COMPANY INC" > "Abc Blincar Company Inc" > "Abc Blin Inc Company Inc" > "Abc Blin Co Company Co" > end > > * To match a specific substring at the end > > gen hasinc = substr(upper(s),-4,.) == " INC" > clonevar s2 = s > replace s2 = substr(s,1,length(s) - 4) if hasinc > > * more generally, to split a string at the last word > > gen tail1 = word(s,-1) > gen head1 = substr(s,1,length(s) - length(tail1) - 1) > > * this can also be done with regexs > > gen head2 = regexs(1) if regexm(s,"(.+) ") > gen tail2 = regexs(1) if regexm(s," ([^ ]+)$") > > * or with regexr if you don't care about the tail > > gen head3 = regexr(s," [^ ]+$","") > > *------------ end example -------------- > > On Thu, Feb 2, 2012 at 4:56 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> replace company = substr(company, 1, length(company) - 4) if >> substr(company, -4, 4) == " INC" >> >> is a better way to remove any training " INC". >> >> >> On Wed, Feb 1, 2012 at 12:23 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: >> >> That's not really a -split- problem. "INC" is not a string separator >> here. I am credited as the original author of -split- so I can tell >> you that it was not designed for this. >>> >>> The easiest recipe (!) I can think of is >>> >>> gen reversed = reverse(company) >>> replace reversed = subinstr(reverse, "CNI ", "", 1) if substr(reversed, 1, 4) == "CNI " >>> replace company = reverse(reversed) >>> >>> That zaps " INC" if and only if it is the last four characters of your variable. >>> >>> The three commands above could be telescoped into one with some loss of clarity. >>> >>> I can believe that this may not delete all you want to delete. >>> >> >>> KOTa >> >>> 1. >>> i am using split command to divide my string variables into parts, is >>> there any way to force the split only by last occurrence of the split >>> sequence? >>> >>> e.g. if strings are like "ABC BLINCAR COMPANY INC" and i want remove >>> the "INC" from all the strings. if i use split, p(INC) i will get "ABC >>> BL" instead of "ABC BLINCAR COMPANY". >>> >>> >>> 2. is there any way to force stata to ignore letters case when >>> comparing strings? >>> e.g. if i merge 2 files by string variable i want that name "ROGER" >>> and name "Roger" would be recognized as the same string >>> >>> NJC>>> In general, you have to clean up inconsistencies before -merge-. -merge- has a difficult enough job as it is! >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

