Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: getting part of strings
From
Rebecca Pope <[email protected]>
To
[email protected]
Subject
Re: st: getting part of strings
Date
Sat, 26 Mar 2011 22:10:37 -0500
Daniel,
You could try using char(). The ASCII equivalent to "A" is 69; for "Z"
it is 90. Maybe something like this would work for you (piggy-backing
on Nick's earlier suggestion):
clonevar copy = var1
replace copy = upper(copy)
qui forval i = 69/90 {
local letter = char(`i')
replace copy = subinstr(copy, "`letter'", "", .)
}
This won't work for all of your text (e.g. Ã). I don't know of any way
to look the numeric values up in Stata, so I'll plug a previous post
by Nick
(http://www.stata.com/statalist/archive/2006-12/msg00446.html) and
advise you to look up the ASCII codes for any accented letters by
searching the internet for "ANSI character code chart". You'll need to
modify the code above to add any additional numbers you need & switch
to -foreach- with -numlist-.
Hope this helps.
Best,
Rebecca
On Sat, Mar 26, 2011 at 8:42 PM, Daniel Marcelino <[email protected]> wrote:
>
> That technique is impressive and make me think about if I could do the
> opposite job, namely strip out all alpha characters. However, just
> changing to i = a/z not seem to work.
>
> Daniel
>
>
> On Sat, Mar 26, 2011 at 9:31 PM, Nick Cox <[email protected]> wrote:
> > Another technique that might be helpful is to strip out all the
> > numeric characters first, e.g. by
> >
> > clonevar copy = var1
> > qui forval i = 0/9 {
> > replace copy = subinstr(copy, "`i'", "", .)
> > }
> >
> > Nick
> > On Sat, Mar 26, 2011 at 8:50 PM, Daniel Marcelino <[email protected]> wrote:
> >> Thanks for helping, I'll work on the code for required output. I
> >> thought Eric Booth's example pretty insightful for my needs. My data
> >> is not delimited by dash "-", rather it is by ";". However, the
> >> original source has a variable with many things nested, so, I want to
> >> split those names, parties, offices and numbers ids into different
> >> variables.
> >>
> >> Best
> >> Daniel
> >>
> >> On Sat, Mar 26, 2011 at 4:42 PM, Eric Booth <[email protected]> wrote:
> >>> <>
> >>> Daniel:
> >>> I missed the part in your post where you want to capture PB and PP as well.
> >>> You could grab these from the var1? that contains this information from my previous example, or another approach entirely is to use the string functions (see -help string_functions-) subinstr() or strpos() to generate indicators if var1 contains the substrings of interest -- this allows you to skip the -split- or regex* approaches completely if this is what you need from var1:
> >>>
> >>> ***********************!
> >>> clear
> >>> inp str200 var1
> >>> "155 - VITAL DO REGO FILHO - PB - Senador"
> >>> "1111 - - PP - - Deputado Federal / 25888 - ATAIDES MENDES PEDROSA -PB - Deputado Estadual"
> >>> "1111 - - PP - - Deputado Federal / 22333 - EDNALDO PEREIRA DESANTANA - PB - Deputado Estadual"
> >>> "151 - JOSE WILSON SANTIAGO - PB - Senador"
> >>> "45123 - ANTONIO HERVAZIO BEZERRA CAVALCANTI - PB - Deputado Estadual"
> >>> "1212 - DAMIÃO FELICIANO DA SILVA - PB - Deputado Federal"
> >>> end
> >>>
> >>> g DF = 1 if strpos(var1, "Deputado Federal")
> >>> g DE = 1 if strpos(var1, "Deputado Estadual")
> >>> g S = 1 if strpos(var1, "Senador")
> >>> g PP = 1 if strpos(var1, "PP")
> >>> g PB = 1 if strpos(var1, "PB")
> >>> order D* P* S
> >>> ***********************!
> >>>
> >>> - Eric
> >>> __
> >>> Eric A. Booth
> >>> Public Policy Research Institute
> >>> Texas A&M University
> >>> [email protected]
> >>> Office: +979.845.6754
> >>>
> >>>
> >>>
> >>> On Mar 26, 2011, at 2:30 PM, Eric Booth wrote:
> >>>
> >>>> ***********************!
> >>>> clear
> >>>> inp str200 var1
> >>>> "155 - VITAL DO REGO FILHO - PB - Senador"
> >>>> "1111 - - PP - - Deputado Federal / 25888 - ATAIDES MENDES PEDROSA -PB - Deputado Estadual"
> >>>> "1111 - - PP - - Deputado Federal / 22333 - EDNALDO PEREIRA DESANTANA - PB - Deputado Estadual"
> >>>> "151 - JOSE WILSON SANTIAGO - PB - Senador"
> >>>> "45123 - ANTONIO HERVAZIO BEZERRA CAVALCANTI - PB - Deputado Estadual"
> >>>> "1212 - DAMIÃO FELICIANO DA SILVA - PB - Deputado Federal"
> >>>> end
> >>>>
> >>>> **using split**
> >>>> replace var1 = subinstr(var1, " / ", " - ", .)
> >>>> split var1, p("-")
> >>>>
> >>>> **trim spaces in new vars**
> >>>> ds var1?
> >>>> foreach v in `r(varlist)' {
> >>>> replace `v' = trim(`v')
> >>>> }
> >>>>
> >>>>
> >>>> **it looks like the substr you want are in vars14, var15, var19:
> >>>> l var14 var15 var19
> >>>>
> >>>> **grab the title or subtitle or gen an indicator if they are present**
> >>>> g str50 title = var14 if !mi(var14)
> >>>> replace title = var15 if mi(title) & !mi(var15)
> >>>> g str50 title2 = var19 if !mi(var19)
> >>>> l var1 title title2
> >>>> **or
> >>>> g titleind = 1 if !mi(var14) | !mi(var15)
> >>>> g title2ind = 1 if !mi(var19)
> >>>> order *ind
> >>>> ***********************!
> >>>
> >>>
> >>>
> >>> *
> >>> * For searches and help try:
> >>> * http://www.stata.com/help.cgi?search
> >>> * http://www.stata.com/support/statalist/faq
> >>> * http://www.ats.ucla.edu/stat/stata/
> >>>
> >>
> >> *
> >> * For searches and help try:
> >> * http://www.stata.com/help.cgi?search
> >> * http://www.stata.com/support/statalist/faq
> >> * http://www.ats.ucla.edu/stat/stata/
> >>
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> >
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/