Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings
Date   Fri, 8 Jun 2012 15:04:51 -0400

A regular expression solution that allows for characters other than
"> and %" at start and finish.

Steve
sjsamuels@gmail.com

****************
clear
input str20 combo
">88.27821"
"91.53401%"
"       76m"
" -31.20785"
">-52.18793"
"39.94933%"
"      +61"
" 89.47855"
" +75.43917"
">82.67717"
"46.31095%"
"       81"
" 45.24185"
" 28.62701"
">77.13605"
"46.79793%"
"       62"
" 19.50868"
" 91.54968"
" 86.64407"
end
replace combo = trim(combo)
des
gen new1 =regexs(2)  ///
if regexm(combo,"^([^0-9+-]?)((\+|\-)?[0-9]+\.?[0-9]+)([^0-9]?)$")
destring new1,replace
list
********************************************************************

On Jun 8, 2012, at 2:40 PM, Nick Cox wrote:

Cox's Third Law of string processing is "regex machinery is great, but
always check first if something simpler will work directly".

I really wouldn't want to support removing + and - characters
separately. You could be removing genuine information!

If the issue is solely the composite prefix, then

subinstr(myvariable, "+/-", "", 1)

is as direct as anything else for pre-processing. If need be you can
of course insist that the prefix must be a prefix

... if substr(myvariable, 1, 3) == "+/-"

The single character is char(177) in my flavour of Stata. Try
-asciiplot- (SSC) to see if yours agrees

subinstr(myvariable, char(177), "", .)

is what I would try.

I like -destring- too.

Nick

On Fri, Jun 8, 2012 at 7:12 PM, Richard Herron
<richard.c.herron@gmail.com> wrote:

> Thanks, David! That's big. I hadn't noticed the -ignore()- option in -destring-.
> 
> But what if I don't know the set of possible prefixes? I guess
> -destring- will throw an error and I iteratively improve my filter?
> 
> I have some where +/- is almost like a LaTeX \pm symbol where the + is
> stacked on the -. I think this is unicode U+00B1.
> http://www.fileformat.info/info/unicode/char/b1/index.htm
> 
> Can I use -destring- to -ignore()- these?

> On Fri, Jun 8, 2012 at 1:59 PM, David Radwin <dradwin@mprinc.com> wrote:
>> Can you use -destring- with the -ignore- option like this?
>> 
>> . destring myvariable, ignore("+/-<>") generate(myvariable2)
>> 
>> David
>> --
>> David Radwin
>> Senior Research Associate
>> MPR Associates, Inc.
>> 2150 Shattuck Ave., Suite 800
>> Berkeley, CA 94704
>> Phone: 510-849-4942
>> Fax: 510-849-0794
>> 
>> www.mprinc.com
>> 
>> 
>>> -----Original Message-----
>>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>>> statalist@hsphsun2.harvard.edu] On Behalf Of Richard Herron
>>> Sent: Friday, June 08, 2012 10:30 AM
>>> To: statalist@hsphsun2.harvard.edu
>>> Subject: st: Remove prefixes (e.g., >, <, and +/-) from numbers stored as
>>> strings
>>> 
>>> I have numbers stored as string with prefixes (e.g., "+/-30") that I
>>> would like to convert to numbers. Not all entries necessarily have
>>> prefixes (or postfixes).
>>> 
>>> With -regexm()- and -regexs()- I can remove from postfixes and handle
>>> decimals, but I can't remove prefixes. Can you spot my error with
>>> -regexm()-? Thanks!
>>> 
>>> Richard Herron
>>> 
>>> * begin code
>>> clear
>>> set obs 20
>>> generate number = 100*runiform()
>>> generate prefix = ""
>>> generate postfix = ""
>>> foreach i of numlist 1 5 10 15 {
>>>     replace prefix = ">" in `i'
>>>     replace postfix = "%" in `=`i' + 1'
>>>     replace number = int(number) in `=`i' + 2'
>>> }
>>> egen combo = concat(prefix number postfix)
>>> generate number2 = regexs(1) if regexm(combo, "([0-9]*\.?[0-9]*)")
>>> list
>>> * end code
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index