Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: regular expression matching


From   joe j <[email protected]>
To   [email protected]
Subject   Re: st: AW: regular expression matching
Date   Sat, 16 Jan 2010 21:47:35 +0100

Thanks Steve for your two suggestions; They have been informative and
will be useful (sorry for not responding sooner-was away).
JJ

On Mon, Jan 11, 2010 at 8:33 PM,  <[email protected]> wrote:
> Just to finish this off, perhaps. Joe was looking for a regular
> expression to match everything preceding the first comma. The
> following will work in BBEdit' and in Stata; the expression to be
> matched is inside the curved brackets.
>
> "^([^\,]+)\,.+$"
>
> If the last ".+$" is omitted, the expression will work in Stata but
> not in BBEdit.
>
> The following expression works in BBEdit but not in Stata:
> "^(.+?)\,.+$"
>
> Stata's parser apparently does not incorporate the non-greedy matching
> function provided by "?".
>
> -Steve
>
> On Mon, Jan 11, 2010 at 10:27 AM,  <[email protected]> wrote:
>> "regexm(address,"^([0-9a-zA-Z\.\-\' ]+)\,")"
>> does.
>>
>> Steve
>>>>> <>
>>>>>
>>>>> If you do insist on using -string- functions (see [D], p. 224):
>>>>>
>>>>>
>>>>> *************
>>>>> clear
>>>>> input str60 address
>>>>> "4905 Lakeway Drive, College Station, Texas 77845 USA"
>>>>> "673 Jasmine Street, Los Angeles, CA 90024"
>>>>> "2376 First street, San Diego, CA 90126"
>>>>> "6 West Central St, Tempe AZ 80068"
>>>>> "1234 Main St. Cambridge, MA 01238-1234"
>>>>> end
>>>>>
>>>>> compress
>>>>>
>>>>> gen str25 first=substr(address, 1, strpos(address, ",")-1)
>>>>> l address first, noo
>>>>> *************
>>>>>
>>>>>
>>>>>
>>>>> HTH
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: [email protected]
>>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>>> Gesendet: Montag, 11. Januar 2010 14:23
>>>>> An: [email protected]
>>>>> Betreff: Re: st: AW: regular expression matching
>>>>>
>>>>> fantastic! thanks much Martin.
>>>>>
>>>>> On Mon, Jan 11, 2010 at 2:07 PM, Martin Weiss <[email protected]> wrote:
>>>>>>
>>>>>> <>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *************
>>>>>> clear
>>>>>> input str60 address
>>>>>> "4905 Lakeway Drive, College Station, Texas 77845 USA"
>>>>>> "673 Jasmine Street, Los Angeles, CA 90024"
>>>>>> "2376 First street, San Diego, CA 90126"
>>>>>> "6 West Central St, Tempe AZ 80068"
>>>>>> "1234 Main St. Cambridge, MA 01238-1234"
>>>>>> end
>>>>>>
>>>>>> split address, parse(,)
>>>>>> ren address1 first
>>>>>>
>>>>>> l address first, noo
>>>>>> *************
>>>>>>
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: [email protected]
>>>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>>>> Gesendet: Montag, 11. Januar 2010 14:03
>>>>>> An: [email protected]
>>>>>> Betreff: st: regular expression matching
>>>>>>
>>>>>> >From a string address variable I want to extract the portion of the
>>>>>> text preceding the 'first' comma.
>>>>>>
>>>>>> Let me illustrate this with the following example:
>>>>>>
>>>>>> clear
>>>>>> input str60 address
>>>>>> "4905 Lakeway Drive, College Station, Texas 77845 USA"
>>>>>> "673 Jasmine Street, Los Angeles, CA 90024"
>>>>>> "2376 First street, San Diego, CA 90126"
>>>>>> "6 West Central St, Tempe AZ 80068"
>>>>>> "1234 Main St. Cambridge, MA 01238-1234"
>>>>>> end
>>>>>>
>>>>>> >From the address column, I want to create a column named First:
>>>>>>
>>>>>> 4905 Lakeway Drive
>>>>>> 673 Jasmine Street
>>>>>> 2376 First street
>>>>>> 6 West Central St
>>>>>> 1234 Main St. Cambridge
>>>>>>
>>>>>> I tried the following:
>>>>>> gen first = regexs(1) if (regexm(address, "(.*)[,]"))
>>>>>>
>>>>>> This however extracts everything in address preceding the last comma,
>>>>>> not the first comma.
>>>>>>
>>>>>> Any pointers would be appreciated.
>>>>>> JJ
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/statalist/faq
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>>
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/statalist/faq
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>>
>>>
>>> --
>>> Steven Samuels
>>> [email protected]
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> 845-246-0774
>>>
>>
>>
>>
>> --
>> Steven Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> 845-246-0774
>>
>
>
>
> --
> Steven Samuels
> [email protected]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> 845-246-0774
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index