Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: getting part of strings |
Date | Sun, 27 Mar 2011 17:44:21 +0100 |
This is a tedious but not difficult conversion job so far as I can see. For example, -asciiplot- on my machine shows char(192) through char(196) as various accented upper case A. So, I would map those all to A -- char(65). You don't need to store anything. qui forval j = 192/196 { replace myvar = subinstr(myvar, "`=char(`j')'", "A", .) } As Eric pointed out, -charlist- from SSC shows which characters there are in your variable and -asciiplot- from SSC gives you a visual table. Nick On Sun, Mar 27, 2011 at 4:57 PM, Daniel Marcelino <dmsilva.br@gmail.com> wrote: > I get it. However this thread lead me to an old issue in my mind, how > take out language marks (accent) from strings replacing by single > letter, like "Ô" for "O" or "È" for "E". > So, maybe I can store a local table with correspondence letters and > run it in a loop for each line of string var. What you think about it? > > /****/ > clear > inp str200 var1 > "45123 - ANTÔNIO HERVÁZIO BEZERRA CAVALCANTI - PB - Deputado Estadual" > "1212 - DAMIÃO FELICIANO DA SILVA - PB - Deputado Federal" > end > > // table accent > local accent = { > ['á'] = 'a', > ['à'] = 'a', > ['ã'] = 'a', > ['é'] = 'e', > ['è'] = 'e', > ['É'] = 'E', > ['Ó'] = 'O', > ['í'] = 'i', > ['Í'] = 'I', > ['ü'] = 'u', > ['Ü'] = 'U', > } > > > > On Sun, Mar 27, 2011 at 1:17 AM, Eric Booth <ebooth@ppri.tamu.edu> wrote: >> <> >> >> On Mar 26, 2011, at 10:10 PM, Rebecca Pope wrote: >> >>> Daniel, >>> You could try using char(). The ASCII equivalent to "A" is 69; for "Z" >>> it is 90. Maybe something like this would work for you (piggy-backing >>> on Nick's earlier suggestion): >>> >>> clonevar copy = var1 >>> replace copy = upper(copy) >>> qui forval i = 69/90 { >>> local letter = char(`i') >>> replace copy = subinstr(copy, "`letter'", "", .) >>> } >> >> Another option is to use c(alpha) and c(ALPHA) for standard alpha characters >> ********modifying NJC's example: >> clonevar copy = var1 >> qui foreach i in `c(alpha)' `c(ALPHA)' { >> replace copy = subinstr(copy, "`i'", "", .) >> } >> ******* >> >>> >>> This won't work for all of your text (e.g. Ã). I don't know of any way >>> to look the numeric values up in Stata, so I'll plug a previous post >>> by Nick >>> (http://www.stata.com/statalist/archive/2006-12/msg00446.html) and >>> advise you to look up the ASCII codes for any accented letters by >>> searching the internet for "ANSI character code chart". You'll need to >>> modify the code above to add any additional numbers you need & switch >>> to -foreach- with -numlist-. >> >> Take a look at -ascii- and -asciiplot- from SSC. >> Also, you can get a list of all the chars used in var1 with -charlist- from SSC. >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/