Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: getting part of strings

From	Daniel Marcelino <[email protected]>
To	[email protected]
Subject	Re: st: getting part of strings
Date	Sat, 26 Mar 2011 17:50:40 -0300

Thanks for helping, I'll work on the code for required output. I
thought Eric Booth's example pretty insightful for my needs. My data
is not delimited by dash "-", rather it is by ";". However, the
original source has a variable with many things nested, so, I want to
split those names, parties, offices and numbers ids into different
variables.

Best
Daniel

On Sat, Mar 26, 2011 at 4:42 PM, Eric Booth <[email protected]> wrote:
> <>
> Daniel:
> I missed the part in your post where you want to capture PB and PP as well.
> You could grab these from the var1? that contains this information from my previous example, or another approach entirely is to use the string functions (see -help string_functions-)  subinstr() or strpos() to generate indicators if var1 contains the substrings of interest -- this allows you to skip the -split- or regex* approaches completely if this is what you need from var1:
>
> ***********************!
> clear
> inp str200 var1
> "155 - VITAL DO REGO FILHO - PB - Senador"
> "1111 -  - PP -  - Deputado Federal / 25888 - ATAIDES MENDES PEDROSA -PB - Deputado Estadual"
> "1111 -  - PP -  - Deputado Federal / 22333 - EDNALDO PEREIRA DESANTANA - PB - Deputado Estadual"
> "151 - JOSE WILSON SANTIAGO - PB - Senador"
> "45123 - ANTONIO HERVAZIO BEZERRA CAVALCANTI - PB - Deputado Estadual"
> "1212 - DAMIÃO FELICIANO DA SILVA - PB - Deputado Federal"
> end
>
> g DF = 1 if strpos(var1, "Deputado Federal")
> g DE = 1 if strpos(var1, "Deputado Estadual")
> g S = 1 if strpos(var1, "Senador")
> g PP = 1 if strpos(var1, "PP")
> g PB = 1 if strpos(var1, "PB")
> order D* P* S
> ***********************!
>
> - Eric
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> [email protected]
> Office: +979.845.6754
>
>
>
> On Mar 26, 2011, at 2:30 PM, Eric Booth wrote:
>
>> ***********************!
>> clear
>> inp str200 var1
>> "155 - VITAL DO REGO FILHO - PB - Senador"
>> "1111 -  - PP -  - Deputado Federal / 25888 - ATAIDES MENDES PEDROSA -PB - Deputado Estadual"
>> "1111 -  - PP -  - Deputado Federal / 22333 - EDNALDO PEREIRA DESANTANA - PB - Deputado Estadual"
>> "151 - JOSE WILSON SANTIAGO - PB - Senador"
>> "45123 - ANTONIO HERVAZIO BEZERRA CAVALCANTI - PB - Deputado Estadual"
>> "1212 - DAMIÃO FELICIANO DA SILVA - PB - Deputado Federal"
>> end
>>
>> **using split**
>> replace var1 = subinstr(var1, " / ", " - ", .)
>> split var1, p("-")
>>
>> **trim spaces in new vars**
>> ds var1?
>> foreach v in `r(varlist)' {
>>       replace `v'  = trim(`v')
>>       }
>>
>>
>> **it looks like the substr you want are in vars14, var15, var19:
>> l var14 var15 var19
>>
>> **grab the title or subtitle or gen an indicator if they are present**
>> g str50 title = var14 if  !mi(var14)
>> replace title = var15 if mi(title) & !mi(var15)
>> g str50 title2 = var19 if  !mi(var19)
>> l var1 title title2
>> **or
>> g titleind = 1 if !mi(var14) | !mi(var15)
>> g title2ind = 1 if !mi(var19)
>> order *ind
>> ***********************!
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: getting part of strings
  - From: Nick Cox <[email protected]>

References:
- st: getting part of strings
  - From: Daniel Marcelino <[email protected]>
- Re: st: getting part of strings
  - From: Eric Booth <[email protected]>
- Re: st: getting part of strings
  - From: Eric Booth <[email protected]>

Prev by Date: st: Use of matrix values in generate statements
Next by Date: Re: st: RE: ivregress with2sls and clustered standard errors
Previous by thread: Re: st: getting part of strings
Next by thread: Re: st: getting part of strings
Index(es):
- Date
- Thread