Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Extracting Data
From
Robert Picard <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Extracting Data
Date
Wed, 20 Nov 2013 17:49:55 -0500
Becker
Here's one way to parse each variable using regex functions.
Robert
* ---------------- begin example -------------------------
clear
input str244 s
"[Meadowfield] Park Sq (Susan Sims) Middle School"
"[Somerset] Upton & Pride School (Judith Taper) El School"
"[Temperly] Lakewood (Jason Stevenson, Jill Harris ) K-12"
"[Packard] W.E.B.Bos ( Bob Williams, Jr.) Middle School"
end
gen district = regexs(1) if regexm(s,"\[(.+)\]")
gen sname = regexs(1) if regexm(s,"\](.+)\(")
gen principal = regexs(1) if regexm(s,"\((.+)\)")
gen stype = regexs(1) if regexm(s,"\)(.+)")
list district sname principal stype
* ---------------- end example ---------------------------
On Wed, Nov 20, 2013 at 4:38 PM, Becker Stein <[email protected]> wrote:
>
> -----Original Message-----
> From: Becker Stein <[email protected]>
> To: statalist <[email protected].>
> Sent: Wed, Nov 20, 2013 9:23 pm
> Subject: Help Extracting Data
>
> Hi,
>
> I'm trying to extract data from a single string variable, and I was
> wondering if how to create a regular expression that I can
> use to do so. I've tried to create one just to extract the school
> name, but to no avail. My data is set up as: [school district] name of
> school (name of principle, name of assistant principle (*if any))
> school type. Below are some examples.
>
> [Meadowfield] Park Square (Susan Sims, John Riley) Middle School
> [Somerset] Upton & Pride Day School (Judith Taper) Elementary School
> [Temperly] Lakewood School (Jason Stevenson, Jill Harris ) K-12
> [Packard] W.E.B. Du Bois ( Robert Williams, Jr.) Middle School
>
> I would like to extract the school name, principle name and asst.
> principle name as separate variables. Sometimes the names have special
> characters such as an "&" (as in the case of Upton & Pride) or a ".".,
> and the administrators section may have only have 1 name or 2 names
> (separated by a comma). Also, some of the data in the brackets and
> parentheses have extra spaces. I initially used the itrim function on
> the variable, and it removed the extra spaces for the content outside of the
> brackets and
> parentheses (i.e., school name and school type), but it didn't work for
> content inside of them (school district and principal names).
> Thanks in advance for any/all help.
>
> Best,
> Becker
>
>
>
>
>
>
>
>
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/