Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RE: Change roman to Arabic numerals |
Date | Tue, 21 Dec 2010 09:53:15 -0800 |
Thanks. Nick sent me a more polished version directly and it works beautifully. He is hoping to send it to ssc. He's not convince the audience for this has n>1. At any rate, thanks again Nick And ho ho ho Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy Radyakin Sent: Monday, December 20, 2010 1:01 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: RE: Change roman to Arabic numerals Hello. Peter writes that his Colleague had to deal only with Roman numbers in the range I-X. For that situation another code (one line) can be suggested: generate byte arabic = strpos("=====I====II===III==IV===V====VI===VII==VIII=IX===X====","="+roman+"=")/5 if !missing(roman) Below is a demo. Obviously the code is not easily extended to the larger ranges, or to include dual notation for 4: "IV" and "IIII". Nick's code is doing a great job of converting the numbers both small and large, however it appears to be too robust, converting even misspelled Roman numerals, such as: IM (999), or IL (49). Both notations may occur in practice (depends on the source of data). Wikipedia denotes them as "would not be generally accepted". I think it would be great to modify the code to either report them as erroneous (misspelled Roman numeral), or convert based on the possible intuition of the respondents (999 and 49 correspondingly), but not to incorrect values (1001 and 51 correspondingly) as the current version does. Numbers like "IIIIIIIIIIIIIIIIIIIII" are converted correctly, but perhaps it's better to report an argument error in such a case. Similarly, not used, characters are currently not reported as errors (e.g. "K" in "XK", although one could encounter "K" in exotic medieval Roman numerals...). This wouldn't be a problem if the program handled all symbols of the Roman numerals, but e.g. "S" (half) is not handled (again see Wikipedia for reference). Best regards, Sergiy Radyakin // ** DEMO ************************************************************************* version 10 clear all input str10 roman "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" "X" end local romans "=====I====II===III==IV===V====VI===VII==VIII=IX===X====" gen byte arabic = strpos(`"`romans'"',"="+roman+"=")/5 if !missing(roman) list // *************************************************************************** On Fri, Dec 17, 2010 at 2:38 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I think anyone tempted to write this would be best advised to extract the subtraction parts of the syntax first, i.e. CM etc. > > (Also, from what I recall IIII is sometimes allowed as a non-standard variant of IV.) > > Here is one stab. This is a Mata function that works on a string vector of Roman numerals in upper case. > > Example first: > > . mata > > : stuff = ("IV", "MCMIV") > > : roman_to_arabic(stuff) > 1 2 > +---------------+ > 1 | 4 1904 | > +---------------+ > > : roman_to_arabic(stuff') > 1 > +--------+ > 1 | 4 | > 2 | 1904 | > +--------+ > > : end > > Code second: > > mata : > > real roman_to_arabic(string vector roman) { > > numeric vector ro > string vector work > ro = J(rows(roman), cols(roman), 0) > work = roman > > ro = ro + 900 * (strpos(work, "CM") :> 0) > work = subinstr(work, "CM", "", .) > ro = ro + 400 * (strpos(work, "CD") :> 0) > work = subinstr(work, "CD", "", .) > ro = ro + 90 * (strpos(work, "XC") :> 0) > work = subinstr(work, "XC", "", .) > ro = ro + 40 * (strpos(work, "XL") :> 0) > work = subinstr(work, "XL", "", .) > ro = ro + 9 * (strpos(work, "IX") :> 0) > work = subinstr(work, "IX", "", .) > ro = ro + 4 * (strpos(work, "IV") :> 0) > work = subinstr(work, "IV", "", .) > > while (sum(strpos(work, "M"))) { > ro = ro + 1000 * (strpos(work, "M") :> 0) > work = subinstr(work, "M", "", 1) > } > > while (sum(strpos(work, "D"))) { > ro = ro + 500 * (strpos(work, "D") :> 0) > work = subinstr(work, "D", "", 1) > } > > while (sum(strpos(work, "C"))) { > ro = ro + 100 * (strpos(work, "C") :> 0) > work = subinstr(work, "C", "", 1) > } > > while (sum(strpos(work, "L"))) { > ro = ro + 50 * (strpos(work, "L") :> 0) > work = subinstr(work, "L", "", 1) > } > > while (sum(strpos(work, "X"))) { > ro = ro + 10 * (strpos(work, "X") :> 0) > work = subinstr(work, "X", "", 1) > } > > while (sum(strpos(work, "V"))) { > ro = ro + 5 * (strpos(work, "V") :> 0) > work = subinstr(work, "V", "", 1) > } > > while (sum(strpos(work, "I"))) { > ro = ro + (strpos(work, "I") :> 0) > work = subinstr(work, "I", "", 1) > } > > return(ro) > } > > end > > > Nick > n.j.cox@durham.ac.uk > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Lachenbruch, Peter > Sent: 17 December 2010 18:50 > To: 'statalist@hsphsun2.harvard.edu' > Subject: st: Change roman to Arabic numerals > > A colleague wants to generate Arabic numbers from Roman numerals and I was = wondering if anyone has written a routine for this. She only has I to X so= I suggested Gen numb=(rom=="I")+2*(rom=="2")+3*(rom=="3")+4*(rom=="4"= > ) etc. > This is OK for this application, but not if we have many numbers. Of course the ordering gets messed up - I, II, III, IV, IX, V, VI, VII, VIII, X so= encode won't work and gen numb=3Dreal(rom) won't do either. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/