Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: regexs and regexm

From	Simon Falck <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: regexs and regexm
Date	Thu, 03 Oct 2013 14:22:23 +0200

Dear Statlist,

Using Stata 11.2, I want to extract a portion of a string variable usingregular expressions, i.e. -regexs- and -regexm-

This job is a bit tricky because the string variable contains severaldifferent types of expressions, lengths, and sometimes spaces, withinformation that looks something like this,


string variable
UK/FI/EI
PMSE    NO(20)
PMSE    NO(20),EI(5),GE(35),CN(20)
PMSE2004    NO(50),EI(10),GE(30),UK(30),SW(30)
POLARLIS    FR(220)
LIDAR_GPS    NI(20),NO(20)
IASK    SE(60),NO(20),UK(20)

What I want is to extract (decomposed) information from the stringvariable into new columns, such as,

var1 var2 var3 var4 var5 var6 var7 var8 var9var10 var11 var12

UK        FI        EI
PM       SE       NO        20

PM SE NO 20 EI 5 GE 30UK 30 SE 30

As I understand, one way of doing this is to use Stata´s regularexpressions: -regexs- and -regexm-, i.e.:


gen x1 = regexs(1)+ regexs(2) if regexm(expnamn, "([a-zA-Z])([a-zA-Z]+)")
gen x2 = regexs(1)+ regexs(2) if regexm(expnamn, "([0-9]+)*([0-9]+)")
..and so on..

However, since the characteristics of the string variable is rich onvariety this task appears far more complex than what I first thought,and I am unable to construct a proper script to decompose the stringvariable in an efficient way.


Any suggestions?

Thanks in advance,
Simon

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: regexs and regexm
  - From: Robert Picard <[email protected]>

Prev by Date: st: adjust and margins
Next by Date: Re: st: regexs and regexm
Previous by thread: st: adjust and margins
Next by thread: Re: st: regexs and regexm
Index(es):
- Date
- Thread