Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Removing a particular expression from string variable


From   Enayetur Raheem <eraheem@wechealthunit.org>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Removing a particular expression from string variable
Date   Wed, 12 Oct 2011 17:14:54 +0000

Ah, thanks! 


 
Enayet 
Data Analyst | Epidemiology Department | Ext.1337

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Dimitriy V. Masterov
Sent: October-12-11 1:08 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Removing a particular expression from string variable

Try this:

gen address=regexr(address, "(^[\#][0-9]+[ ]*[\-][ ]*)","")

DVM


On Wed, Oct 12, 2011 at 1:03 PM, Enayetur Raheem <eraheem@wechealthunit.org> wrote:
> Dear listers
>
> I am trying to remove a particular expression/pattern from the address field, and retain the remaining portion. I can extract the portion I want to remove, but could not extract the remaining part.
>
> Consider the following data:
>
> clear
> input str60 address
> "#12-4905 Lakeway Drive, College Station, Texas 77845 USA"
> "#12 - 673 Jasmine Street, Los Angeles, CA 90024"
> "2376 First street, San Diego, CA 90126"
> "66666 West Central St, Tempe AZ 80068"
> "12345 Main St. Cambridge, MA 01238-1234"
> "12345 Main St  Sommerville  MA 01239-2345"
> "12345 Main St  Watertwon  MA 01239   USA"
> end
>
> I need to remove anything starting with "#" and ending with "-" in the 
> beginning of the field. That is, I need to remove
>
> "#12-" from the first case
> "#12 - " from the second case (note the space before and after the dash.
> Other cases will remain intact.
>
> The following code gives me the expressions I want to remove.
> gen apt = regexs(0) if regexm(address, "(^[\#][0-9]+[ ]*[\-][ ]*)")
>
> But I actually want to retain the remaining part. What would be the syntax for that? Any clue will be much appreciated.
>
> Thanks in advance.
>
> Enayet
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index