Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings |

Date |
Fri, 8 Jun 2012 15:04:51 -0400 |

A regular expression solution that allows for characters other than "> and %" at start and finish. Steve sjsamuels@gmail.com **************** clear input str20 combo ">88.27821" "91.53401%" " 76m" " -31.20785" ">-52.18793" "39.94933%" " +61" " 89.47855" " +75.43917" ">82.67717" "46.31095%" " 81" " 45.24185" " 28.62701" ">77.13605" "46.79793%" " 62" " 19.50868" " 91.54968" " 86.64407" end replace combo = trim(combo) des gen new1 =regexs(2) /// if regexm(combo,"^([^0-9+-]?)((\+|\-)?[0-9]+\.?[0-9]+)([^0-9]?)$") destring new1,replace list ******************************************************************** On Jun 8, 2012, at 2:40 PM, Nick Cox wrote: Cox's Third Law of string processing is "regex machinery is great, but always check first if something simpler will work directly". I really wouldn't want to support removing + and - characters separately. You could be removing genuine information! If the issue is solely the composite prefix, then subinstr(myvariable, "+/-", "", 1) is as direct as anything else for pre-processing. If need be you can of course insist that the prefix must be a prefix ... if substr(myvariable, 1, 3) == "+/-" The single character is char(177) in my flavour of Stata. Try -asciiplot- (SSC) to see if yours agrees subinstr(myvariable, char(177), "", .) is what I would try. I like -destring- too. Nick On Fri, Jun 8, 2012 at 7:12 PM, Richard Herron <richard.c.herron@gmail.com> wrote: > Thanks, David! That's big. I hadn't noticed the -ignore()- option in -destring-. > > But what if I don't know the set of possible prefixes? I guess > -destring- will throw an error and I iteratively improve my filter? > > I have some where +/- is almost like a LaTeX \pm symbol where the + is > stacked on the -. I think this is unicode U+00B1. > http://www.fileformat.info/info/unicode/char/b1/index.htm > > Can I use -destring- to -ignore()- these? > On Fri, Jun 8, 2012 at 1:59 PM, David Radwin <dradwin@mprinc.com> wrote: >> Can you use -destring- with the -ignore- option like this? >> >> . destring myvariable, ignore("+/-<>") generate(myvariable2) >> >> David >> -- >> David Radwin >> Senior Research Associate >> MPR Associates, Inc. >> 2150 Shattuck Ave., Suite 800 >> Berkeley, CA 94704 >> Phone: 510-849-4942 >> Fax: 510-849-0794 >> >> www.mprinc.com >> >> >>> -----Original Message----- >>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- >>> statalist@hsphsun2.harvard.edu] On Behalf Of Richard Herron >>> Sent: Friday, June 08, 2012 10:30 AM >>> To: statalist@hsphsun2.harvard.edu >>> Subject: st: Remove prefixes (e.g., >, <, and +/-) from numbers stored as >>> strings >>> >>> I have numbers stored as string with prefixes (e.g., "+/-30") that I >>> would like to convert to numbers. Not all entries necessarily have >>> prefixes (or postfixes). >>> >>> With -regexm()- and -regexs()- I can remove from postfixes and handle >>> decimals, but I can't remove prefixes. Can you spot my error with >>> -regexm()-? Thanks! >>> >>> Richard Herron >>> >>> * begin code >>> clear >>> set obs 20 >>> generate number = 100*runiform() >>> generate prefix = "" >>> generate postfix = "" >>> foreach i of numlist 1 5 10 15 { >>> replace prefix = ">" in `i' >>> replace postfix = "%" in `=`i' + 1' >>> replace number = int(number) in `=`i' + 2' >>> } >>> egen combo = concat(prefix number postfix) >>> generate number2 = regexs(1) if regexm(combo, "([0-9]*\.?[0-9]*)") >>> list >>> * end code >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Richard Herron <richard.c.herron@gmail.com>

**References**:**st: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Richard Herron <richard.c.herron@gmail.com>

**st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*"David Radwin" <dradwin@mprinc.com>

**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Richard Herron <richard.c.herron@gmail.com>

**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings** - Next by Date:
**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings** - Previous by thread:
**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings** - Next by thread:
**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings** - Index(es):