Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regular expressions


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Regular expressions
Date   Sun, 9 Mar 2014 11:00:11 +0000

You could apply the reverse of the regular expression to the reversed
string to remove the last instance of a parenthesised date.


Nick
[email protected]


On 9 March 2014 03:15, Roberto Ferrer <[email protected]> wrote:
> Previous solutions to Estrella's Problem 1 won't work if there is a year between
> parenthesis in the movie name. I think that is not common, but maybe
> we can focus
> on the assumption that the year of the movie is always at the end.
>
> *------------------ begin code ------------------
>
>
> clear all
> set more off
>
> set obs 1
> gen movie = "Robin (1984) Hood (2000)"
>
> * Fail
> gen movie2 = rtrim(substr(movie, 1, index(movie, "(") - 1))
> gen movie3 = trim(regexr(movie, "(\([1-2][0-9][0-9][0-9]\))", ""))
>
> * Alternatives
> gen movie4 = trim(substr(movie, 1, length(movie)-6))
> gen movie5 = trim(reverse(substr(reverse(movie), 7, .)))
> gen movie6 = trim(regexr(movie, "(\([1-2][0-9][0-9][0-9]\)$)", ""))
>
> list
>
> *----------------- end code ----------------------
>
>
> On Fri, Mar 7, 2014 at 9:11 AM, Joe Canner <[email protected]> wrote:
>> Good point about the first digit :)
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
>> Sent: Friday, March 07, 2014 8:38 AM
>> To: [email protected]
>> Subject: Re: st: Regular expressions
>>
>> Unsurprisingly, this is almost identical to Joe's
>>
>> I feel confident that the first digit must be 1 or 2.
>> Nick
>> [email protected]
>>
>>
>> On 7 March 2014 13:35, Nick Cox <[email protected]> wrote:
>>> clear
>>> set obs 1
>>> gen test = "Robin Hood (2000)"
>>> gen test2 = trim(regexr(test, "(\([1-2][0-9][0-9][0-9]\))", ""))
>>> list
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 7 March 2014 13:28, Marco Savegnago <[email protected]> wrote:
>>>> Dear all,
>>>> as regard point 1) this might work:
>>>>
>>>> gen movie2 = rtrim(substr(movie, 1, index(movie, "(") - 1))
>>>>
>>>> I thinks it works as long as the title of the movie does not contain
>>>> other round brackets except those for the year.
>>>>
>>>> What do you think?
>>>> best,
>>>> Marco
>>>>
>>>> 2014-03-07 12:49 GMT+01:00 Nick Cox <[email protected]>:
>>>>> Your second problem sounds like for -split-. I wouldn't reach for
>>>>> regular expressions there.
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 7 March 2014 11:40, Estrella Gomez <[email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I would like to do two modifications to two string variables using
>>>>>> regular expression:
>>>>>>
>>>>>> 1) I have a list of movie titles with a year included; for instance:
>>>>>> "Robin Hood (2010)". I would like to drop the years and the
>>>>>> parenthesis, so the final value should be "Robin Hood". The number of
>>>>>> words in the title varies a lot across movies
>>>>>>
>>>>>> 2) I have a variable indicating where the movie was produced. In some
>>>>>> cases there are several countries, for instance "UK, Germany, Canada,
>>>>>> Switzerland". I would like to generate one variable per country (1st
>>>>>> variable take value UK, 2nd Germany and so on). Again, the number of
>>>>>> countries per movie is not fixed; it varies from 1 to 4
>>>>>>
>>>>>> Any suggestion?
>>>>>>
>>>>>> Thanks a lot,
>>>>>> Estrella
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index