Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Enayetur Raheem <eraheem@wechealthunit.org> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | st: Removing a particular expression from string variable |
Date | Wed, 12 Oct 2011 17:03:42 +0000 |
Dear listers I am trying to remove a particular expression/pattern from the address field, and retain the remaining portion. I can extract the portion I want to remove, but could not extract the remaining part. Consider the following data: clear input str60 address "#12-4905 Lakeway Drive, College Station, Texas 77845 USA" "#12 - 673 Jasmine Street, Los Angeles, CA 90024" "2376 First street, San Diego, CA 90126" "66666 West Central St, Tempe AZ 80068" "12345 Main St. Cambridge, MA 01238-1234" "12345 Main St Sommerville MA 01239-2345" "12345 Main St Watertwon MA 01239 USA" end I need to remove anything starting with "#" and ending with "-" in the beginning of the field. That is, I need to remove "#12-" from the first case "#12 - " from the second case (note the space before and after the dash. Other cases will remain intact. The following code gives me the expressions I want to remove. gen apt = regexs(0) if regexm(address, "(^[\#][0-9]+[ ]*[\-][ ]*)") But I actually want to retain the remaining part. What would be the syntax for that? Any clue will be much appreciated. Thanks in advance. Enayet * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/