Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: getting part of strings


From   Daniel Marcelino <[email protected]>
To   [email protected]
Subject   Re: st: getting part of strings
Date   Sun, 27 Mar 2011 12:57:55 -0300

I get it. However this thread lead me to an old issue in my mind, how
take out language marks (accent) from strings replacing by single
letter, like "Ô" for "O" or "È" for "E".
So, maybe I can store a local table with correspondence letters and
run it in a loop for each line of string var. What you think about it?

/****/
clear
inp str200 var1
"45123 - ANTÔNIO HERVÁZIO BEZERRA CAVALCANTI - PB - Deputado Estadual"
"1212 - DAMIÃO FELICIANO DA SILVA - PB - Deputado Federal"
end

// table accent
local accent = {
  ['á'] = 'a',
  ['à'] = 'a',
  ['ã'] = 'a',
  ['é'] = 'e',
  ['è'] = 'e',
  ['É'] = 'E',
  ['Ó'] = 'O',
  ['í'] = 'i',
  ['Í'] = 'I',
  ['ü'] = 'u',
  ['Ü'] = 'U',
}



On Sun, Mar 27, 2011 at 1:17 AM, Eric Booth <[email protected]> wrote:
> <>
>
> On Mar 26, 2011, at 10:10 PM, Rebecca Pope wrote:
>
>> Daniel,
>> You could try using char(). The ASCII equivalent to "A" is 69; for "Z"
>> it is 90. Maybe something like this would work for you (piggy-backing
>> on Nick's earlier suggestion):
>>
>> clonevar copy = var1
>> replace copy = upper(copy)
>> qui forval i = 69/90 {
>>     local letter = char(`i')
>>     replace copy = subinstr(copy, "`letter'", "", .)
>> }
>
> Another option is to use c(alpha) and c(ALPHA) for standard alpha characters
> ********modifying NJC's example:
> clonevar copy = var1
> qui foreach i in `c(alpha)' `c(ALPHA)'  {
>           replace copy = subinstr(copy, "`i'", "", .)
> }
> *******
>
>>
>> This won't work for all of your text (e.g. Ã). I don't know of any way
>> to look the numeric values up in Stata, so I'll plug a previous post
>> by Nick
>> (http://www.stata.com/statalist/archive/2006-12/msg00446.html) and
>> advise you to look up the ASCII codes for any accented letters by
>> searching the internet for "ANSI character code chart". You'll need to
>> modify the code above to add any additional numbers you need & switch
>> to -foreach- with -numlist-.
>
> Take a look at -ascii- and -asciiplot- from SSC.
> Also, you can get a list of all the chars used in var1 with -charlist- from SSC.
>
>
> - Eric
>
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> [email protected]
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index