Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: finding a word within a string variable in Stata 12


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: finding a word within a string variable in Stata 12
Date   Thu, 22 Mar 2012 06:56:20 +0000

For completeness, note that

gen benev = strpos(orgname, "Benev") & strpos(orgname, "Assoc")

gets you there too (and that as above your two statements could be
collapsed to one).  I am not dogmatic against regex machinery. For
examples see

Cox, N.J. 2011
Speaking Stata: MMXI and all that: Handling Roman numerals within Stata.
Stata Journal  11(1): 126-142.

Abstract.  The problem of handling Roman numerals in Stata is used to
illustrate issues arising in the handling of classification codes in
character string form and their numeric equivalents. The solutions
include Stata programs and Mata functions for conversion from numeric
to string and from string to numeric. Defining acceptable input and
trapping and flagging incorrect or unmanageable inputs are key
concerns in good practice. Regular expressions are especially valuable
for this problem.

and -moss- from SSC by Robert Picard and myself. I just find myself
pointing out how easy -strpos()- is to use in many problems.

Nick

On Thu, Mar 22, 2012 at 1:28 AM, Michael Mulcahy
<mulcahy_uconn@yahoo.com> wrote:

> I have been using regexm way too much recently - I'm categorizing non-profit organizations based strings of organizational names, such as:
>
>
> obs1: orgname == "Seattle Brotherhood of Whatever Benevolent Association"  and
> obs2: orgname == "Memphis Big Capital Employees Benevolent Assoc"
> obs3: orgname == "Peoria Association of Dairy Farmers"
> My klunky approach is:
>
>
> gen benev = 0
> replace benev = regexm(orgname, "Benev") & regexm(orgname, "Assoc")
>
>
> This codes obs1 & obs2 as "1", and leaves obs3 as "0"

Nick Cox <njcoxstata@gmail.com>

> I haven't tried to see what doesn't work with the regex machinery
> because this problem seems to call only for
>
> gen construction = strpos(sic, "construction") > 0

On Wed, Mar 21, 2012 at 7:28 PM, Navarro Paniagua, Maria

>> I am trying to find a word (for instance construction) within a string
>> variable (sic), the string can have as categories (construction 1, b
>> construction)
>>
>>
>>
>> Could you please help me with this?
>>
>>
>>
>> gen construction = regexs(1) if regexm(sic, "[construction]+")
>>
>> g one = 1 if strmatch(sic, "*constr*")
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index