I pushed this a bit further. A help file (not included here) spells out the assumptions (and limitations) which can also be inferred from the code. I'll ask Kit Baum to put this up on SSC to complete the loop. Mata isn't essential for this problem, but it makes the problem more fun. *! 1.0.0 NJC 20 December 2010 program romantoarabic version 9 syntax varname(string) [if] [in] , Generate(str) quietly { marksample touse, strok count if `touse' if r(N) == 0 error 2000 confirm new variable `generate' tempvar work gen `work' = upper(trim(itrim(`varlist'))) if `touse' gen `generate' = . mata : roman_to_arabic("`work'", "`generate'", "`touse'") count if `work' != "" & `touse' replace `generate' = . if `work' != "" & `touse' } if r(N) { di _n as txt "Problematic input: " list `varlist' if `work' != "" & `touse' } end mata : void roman_to_arabic(string scalar varname, string scalar genname, string scalar usename) { string colvector work real colvector y work = st_sdata(., varname, usename) y = J(rows(work), 1, 0) y = y + 900 * (strpos(work, "CM") :> 0) work = subinstr(work, "CM", "", .) y = y + 400 * (strpos(work, "CD") :> 0) work = subinstr(work, "CD", "", .) y = y + 90 * (strpos(work, "XC") :> 0) work = subinstr(work, "XC", "", .) y = y + 40 * (strpos(work, "XL") :> 0) work = subinstr(work, "XL", "", .) y = y + 9 * (strpos(work, "IX") :> 0) work = subinstr(work, "IX", "", .) y = y + 4 * (strpos(work, "IV") :> 0) work = subinstr(work, "IV", "", .) while (sum(strpos(work, "M"))) { y = y + 1000 * (strpos(work, "M") :> 0) work = subinstr(work, "M", "", 1) } while (sum(strpos(work, "D"))) { y = y + 500 * (strpos(work, "D") :> 0) work = subinstr(work, "D", "", 1) } while (sum(strpos(work, "C"))) { y = y + 100 * (strpos(work, "C") :> 0) work = subinstr(work, "C", "", 1) } while (sum(strpos(work, "L"))) { y = y + 50 * (strpos(work, "L") :> 0) work = subinstr(work, "L", "", 1) } while (sum(strpos(work, "X"))) { y = y + 10 * (strpos(work, "X") :> 0) work = subinstr(work, "X", "", 1) } while (sum(strpos(work, "V"))) { y = y + 5 * (strpos(work, "V") :> 0) work = subinstr(work, "V", "", 1) } while (sum(strpos(work, "I"))) { y = y + (strpos(work, "I") :> 0) work = subinstr(work, "I", "", 1) } st_store(., genname, usename, y) st_sstore(., varname, usename, work) } end Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: 17 December 2010 19:39 To: 'statalist@hsphsun2.harvard.edu' Subject: st: RE: Change roman to Arabic numerals I think anyone tempted to write this would be best advised to extract the subtraction parts of the syntax first, i.e. CM etc. (Also, from what I recall IIII is sometimes allowed as a non-standard variant of IV.) Here is one stab. This is a Mata function that works on a string vector of Roman numerals in upper case. Example first: . mata : stuff = ("IV", "MCMIV") : roman_to_arabic(stuff) 1 2 +---------------+ 1 | 4 1904 | +---------------+ : roman_to_arabic(stuff') 1 +--------+ 1 | 4 | 2 | 1904 | +--------+ : end Code second: mata : real roman_to_arabic(string vector roman) { numeric vector ro string vector work ro = J(rows(roman), cols(roman), 0) work = roman ro = ro + 900 * (strpos(work, "CM") :> 0) work = subinstr(work, "CM", "", .) ro = ro + 400 * (strpos(work, "CD") :> 0) work = subinstr(work, "CD", "", .) ro = ro + 90 * (strpos(work, "XC") :> 0) work = subinstr(work, "XC", "", .) ro = ro + 40 * (strpos(work, "XL") :> 0) work = subinstr(work, "XL", "", .) ro = ro + 9 * (strpos(work, "IX") :> 0) work = subinstr(work, "IX", "", .) ro = ro + 4 * (strpos(work, "IV") :> 0) work = subinstr(work, "IV", "", .) while (sum(strpos(work, "M"))) { ro = ro + 1000 * (strpos(work, "M") :> 0) work = subinstr(work, "M", "", 1) } while (sum(strpos(work, "D"))) { ro = ro + 500 * (strpos(work, "D") :> 0) work = subinstr(work, "D", "", 1) } while (sum(strpos(work, "C"))) { ro = ro + 100 * (strpos(work, "C") :> 0) work = subinstr(work, "C", "", 1) } while (sum(strpos(work, "L"))) { ro = ro + 50 * (strpos(work, "L") :> 0) work = subinstr(work, "L", "", 1) } while (sum(strpos(work, "X"))) { ro = ro + 10 * (strpos(work, "X") :> 0) work = subinstr(work, "X", "", 1) } while (sum(strpos(work, "V"))) { ro = ro + 5 * (strpos(work, "V") :> 0) work = subinstr(work, "V", "", 1) } while (sum(strpos(work, "I"))) { ro = ro + (strpos(work, "I") :> 0) work = subinstr(work, "I", "", 1) } return(ro) } end Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Lachenbruch, Peter Sent: 17 December 2010 18:50 To: 'statalist@hsphsun2.harvard.edu' Subject: st: Change roman to Arabic numerals A colleague wants to generate Arabic numbers from Roman numerals and I was = wondering if anyone has written a routine for this. She only has I to X so I suggested Gen numb=(rom=="I")+2*(rom=="2")+3*(rom=="3")+4*(rom=="4") etc. This is OK for this application, but not if we have many numbers. Of course the ordering gets messed up - I, II, III, IV, IX, V, VI, VII, VIII, X so encode won't work and gen numb=real(rom) won't do either.

