Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regular expressions


From   Roberto Ferrer <[email protected]>
To   Stata Help <[email protected]>
Subject   Re: st: Regular expressions
Date   Sat, 8 Mar 2014 22:45:02 -0430

Previous solutions to Estrella's Problem 1 won't work if there is a year between
parenthesis in the movie name. I think that is not common, but maybe
we can focus
on the assumption that the year of the movie is always at the end.

*------------------ begin code ------------------


clear all
set more off

set obs 1
gen movie = "Robin (1984) Hood (2000)"

* Fail
gen movie2 = rtrim(substr(movie, 1, index(movie, "(") - 1))
gen movie3 = trim(regexr(movie, "(\([1-2][0-9][0-9][0-9]\))", ""))

* Alternatives
gen movie4 = trim(substr(movie, 1, length(movie)-6))
gen movie5 = trim(reverse(substr(reverse(movie), 7, .)))
gen movie6 = trim(regexr(movie, "(\([1-2][0-9][0-9][0-9]\)$)", ""))

list

*----------------- end code ----------------------


On Fri, Mar 7, 2014 at 9:11 AM, Joe Canner <[email protected]> wrote:
> Good point about the first digit :)
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Friday, March 07, 2014 8:38 AM
> To: [email protected]
> Subject: Re: st: Regular expressions
>
> Unsurprisingly, this is almost identical to Joe's
>
> I feel confident that the first digit must be 1 or 2.
> Nick
> [email protected]
>
>
> On 7 March 2014 13:35, Nick Cox <[email protected]> wrote:
>> clear
>> set obs 1
>> gen test = "Robin Hood (2000)"
>> gen test2 = trim(regexr(test, "(\([1-2][0-9][0-9][0-9]\))", ""))
>> list
>> Nick
>> [email protected]
>>
>>
>> On 7 March 2014 13:28, Marco Savegnago <[email protected]> wrote:
>>> Dear all,
>>> as regard point 1) this might work:
>>>
>>> gen movie2 = rtrim(substr(movie, 1, index(movie, "(") - 1))
>>>
>>> I thinks it works as long as the title of the movie does not contain
>>> other round brackets except those for the year.
>>>
>>> What do you think?
>>> best,
>>> Marco
>>>
>>> 2014-03-07 12:49 GMT+01:00 Nick Cox <[email protected]>:
>>>> Your second problem sounds like for -split-. I wouldn't reach for
>>>> regular expressions there.
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 7 March 2014 11:40, Estrella Gomez <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> I would like to do two modifications to two string variables using
>>>>> regular expression:
>>>>>
>>>>> 1) I have a list of movie titles with a year included; for instance:
>>>>> "Robin Hood (2010)". I would like to drop the years and the
>>>>> parenthesis, so the final value should be "Robin Hood". The number of
>>>>> words in the title varies a lot across movies
>>>>>
>>>>> 2) I have a variable indicating where the movie was produced. In some
>>>>> cases there are several countries, for instance "UK, Germany, Canada,
>>>>> Switzerland". I would like to generate one variable per country (1st
>>>>> variable take value UK, 2nd Germany and so on). Again, the number of
>>>>> countries per movie is not fixed; it varies from 1 to 4
>>>>>
>>>>> Any suggestion?
>>>>>
>>>>> Thanks a lot,
>>>>> Estrella
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index