Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: finding a word within a string variable in Stata 12

From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Re: finding a word within a string variable in Stata 12
Date   Thu, 22 Mar 2012 06:56:20 +0000

For completeness, note that

gen benev = strpos(orgname, "Benev") & strpos(orgname, "Assoc")

gets you there too (and that as above your two statements could be
collapsed to one).  I am not dogmatic against regex machinery. For
examples see

Cox, N.J. 2011
Speaking Stata: MMXI and all that: Handling Roman numerals within Stata.
Stata Journal  11(1): 126-142.

Abstract.  The problem of handling Roman numerals in Stata is used to
illustrate issues arising in the handling of classification codes in
character string form and their numeric equivalents. The solutions
include Stata programs and Mata functions for conversion from numeric
to string and from string to numeric. Defining acceptable input and
trapping and flagging incorrect or unmanageable inputs are key
concerns in good practice. Regular expressions are especially valuable
for this problem.

and -moss- from SSC by Robert Picard and myself. I just find myself
pointing out how easy -strpos()- is to use in many problems.


On Thu, Mar 22, 2012 at 1:28 AM, Michael Mulcahy
<[email protected]> wrote:

> I have been using regexm way too much recently - I'm categorizing non-profit organizations based strings of organizational names, such as:
> obs1: orgname == "Seattle Brotherhood of Whatever Benevolent Association"  and
> obs2: orgname == "Memphis Big Capital Employees Benevolent Assoc"
> obs3: orgname == "Peoria Association of Dairy Farmers"
> My klunky approach is:
> gen benev = 0
> replace benev = regexm(orgname, "Benev") & regexm(orgname, "Assoc")
> This codes obs1 & obs2 as "1", and leaves obs3 as "0"

Nick Cox <[email protected]>

> I haven't tried to see what doesn't work with the regex machinery
> because this problem seems to call only for
> gen construction = strpos(sic, "construction") > 0

On Wed, Mar 21, 2012 at 7:28 PM, Navarro Paniagua, Maria

>> I am trying to find a word (for instance construction) within a string
>> variable (sic), the string can have as categories (construction 1, b
>> construction)
>> Could you please help me with this?
>> gen construction = regexs(1) if regexm(sic, "[construction]+")
>> g one = 1 if strmatch(sic, "*constr*")

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index