Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extracting Data


From   Becker Stein <[email protected]>
To   [email protected]
Subject   Re: st: Extracting Data
Date   Thu, 21 Nov 2013 00:40:22 -0500 (EST)

Hi Robert,

Thanks so much for your help. This worked perfectly.

Best,
Becker

-----Original Message-----
From: Robert Picard <[email protected]>
To: statalist <[email protected]>
Sent: Wed, Nov 20, 2013 10:51 pm
Subject: Re: st: Extracting Data

Becker

Here's one way to parse each variable using regex functions.

Robert

* ---------------- begin example -------------------------
clear
input str244 s
"[Meadowfield] Park Sq (Susan Sims) Middle School"
"[Somerset] Upton & Pride School (Judith Taper) El School"
"[Temperly] Lakewood (Jason Stevenson, Jill Harris ) K-12"
"[Packard] W.E.B.Bos ( Bob Williams, Jr.) Middle School"
end

gen district = regexs(1) if regexm(s,"\[(.+)\]")
gen sname = regexs(1) if regexm(s,"\](.+)\(")
gen principal = regexs(1) if regexm(s,"\((.+)\)")
gen stype = regexs(1) if regexm(s,"\)(.+)")

list district sname principal stype
* ---------------- end example ---------------------------

On Wed, Nov 20, 2013 at 4:38 PM, Becker Stein <[email protected]> wrote:

-----Original Message-----
From: Becker Stein <[email protected]>
To: statalist <[email protected].>
Sent: Wed, Nov 20, 2013 9:23 pm
Subject: Help Extracting Data

Hi,

I'm trying to extract data from a single string variable, and I was
wondering if how to create a regular expression that I can
use to do so. I've tried to create one just to extract the school
name, but to no avail. My data is set up as: [school district] name of
school (name of principle, name of assistant principle (*if any))
school type. Below are some examples.

[Meadowfield] Park Square (Susan Sims, John Riley) Middle School
[Somerset] Upton & Pride Day School (Judith  Taper) Elementary School
[Temperly] Lakewood School (Jason Stevenson, Jill Harris ) K-12
[Packard] W.E.B. Du Bois ( Robert Williams, Jr.) Middle School

I would like to extract the school name, principle name and asst.
principle name as separate variables. Sometimes the names have special
characters such as an "&" (as in the case of Upton & Pride) or a ".".,
and the administrators section may have only have 1 name or 2 names
(separated by a comma). Also, some of the data in the brackets and
parentheses have extra spaces. I initially used the itrim function on
the variable, and it removed the extra spaces for the content outside
of the
brackets and
parentheses (i.e., school name and school type), but it didn't work
for
content inside of them (school district and principal names).
Thanks in advance for any/all help.

Best,
Becker












*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index