Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: strings


From   KOTa <kota.alba@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: strings
Date   Thu, 2 Feb 2012 21:43:07 +0100

thanks Nick, that is how i did it.

Robert, thank for examples, i was looking for something like word(),
but probably missed it in help files

El día 2 de febrero de 2012 16:28, Robert Picard <picard@netbox.com> escribió:
> It's perhaps worth noting that some string functions can work from the
> end of the string.
>
> *----------- begin example -------------
>
> clear
> input str50 s
>  "ABC BLINCAR COMPANY INC"
>  "Abc Blincar Company Inc"
>  "Abc Blin Inc Company Inc"
>  "Abc Blin Co Company Co"
>  end
>
> * To match a specific substring at the end
>
> gen hasinc = substr(upper(s),-4,.) == " INC"
> clonevar s2 = s
> replace s2 = substr(s,1,length(s) - 4) if hasinc
>
> * more generally, to split a string at the last word
>
> gen tail1 = word(s,-1)
> gen head1 = substr(s,1,length(s) - length(tail1) - 1)
>
> * this can also be done with regexs
>
> gen head2 = regexs(1) if regexm(s,"(.+) ")
> gen tail2 = regexs(1) if regexm(s," ([^ ]+)$")
>
> * or with regexr if you don't care about the tail
>
> gen head3 = regexr(s," [^ ]+$","")
>
> *------------ end example --------------
>
> On Thu, Feb 2, 2012 at 4:56 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> replace company = substr(company, 1, length(company) - 4) if
>> substr(company, -4, 4) == " INC"
>>
>> is a better way to remove any training " INC".
>>
>>
>> On Wed, Feb 1, 2012 at 12:23 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>>
>> That's not really a -split- problem. "INC" is not a string separator
>> here. I am credited as the original author of -split- so I can tell
>> you that it was not designed for this.
>>>
>>> The easiest recipe (!) I can think of is
>>>
>>> gen reversed = reverse(company)
>>> replace reversed = subinstr(reverse, "CNI ", "", 1) if substr(reversed, 1, 4) == "CNI "
>>> replace company = reverse(reversed)
>>>
>>> That zaps " INC" if and only if it is the last four characters of your variable.
>>>
>>> The three commands above could be telescoped into one with some loss of clarity.
>>>
>>> I can believe that this may not delete all you want to delete.
>>>
>>
>>> KOTa
>>
>>> 1.
>>> i am using split command to divide my string variables into parts, is
>>> there any way to force the split only by last occurrence of the split
>>> sequence?
>>>
>>> e.g. if strings are like "ABC BLINCAR COMPANY INC" and i want remove
>>> the "INC" from all the strings. if i use split, p(INC) i will get "ABC
>>> BL" instead of "ABC BLINCAR COMPANY".
>>>
>>>
>>> 2. is there any way to force stata to ignore letters case when
>>> comparing strings?
>>> e.g. if i merge 2 files by string variable i want that name "ROGER"
>>> and name "Roger" would be recognized as the same string
>>>
>>> NJC>>> In general, you have to clean up inconsistencies before -merge-. -merge- has a difficult enough job as it is!
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index