Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: Regexr Stata


From   "Sergiy Radyakin" <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: RE: Regexr Stata
Date   Fri, 26 Sep 2008 11:23:13 -0400

On Fri, Sep 26, 2008 at 9:08 AM, Diana Eastman <deastman@gma-us.com> wrote:
>
> Thank you for the responses. This is incredibly helpful.
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu on behalf of Matt Spittal
> Sent: Thu 9/25/2008 10:18 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: Regexr Stata
>
> Diana,
>
> One way of searching a string for a certain match (regardless of its position) is to do the following
>
>        generate str15 grp = "A.I.F" if regexm(name, "A.I.F")

the same can alse be achieved with a simple strpos() function:
         generate str15 grp = "A.I.F" if strpos(name, "A.I.F")
since regular expression is hardly used in the above.

Regards, Sergiy Radyakin

>
> This will create a string variable called 'grp' which will equal A.I.F if A.I.F appeared anywhere within the variable 'name'. This works because -regexm(name, "A.I.F")- returns 1 if the statement if true and 0 if it is false.  So Stata will create a variable called 'grp' and assign it the value A.I.F if the statement is true and missing if it is not.
>
> As an extension to this, if your data looks like this
>
>        A.I.F. GMBH
>        A.I.F. COMPANY
>        QANTAS
>        AIR NEW ZEALAND
>
> and you also want to identify, say, QANTAS flights, then you can add this line to your code
>
>        replace grp = "QANTAS" if regexm(name, "QANTAS")
>
> You'll have to be careful if your data looks something like this
>
>        A.I.F. GMBH
>        A.I.F. COMPANY
>        QANTAS A.I.F
>        AIR NEW ZEALAND
>
> because the value A.I.F, which was created with the -generate- statement, will be replaced with QANTAS in the -replace- statement.
>
> -- Matt
> matt.spittal@cancervic.org.au
>
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Diana Eastman
> Sent: Friday, 26 September 2008 7:36 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Regexr Stata
>
>
> Hi all,
>
> I have a variable called "name" which lists several different airlines.
> I need to write some code that will identify a regular expression within
> these names and assign them a value in the new variable "group_name"
>
> For instance, for the two names:
>
> A.I.F. GMBH
> A.I.F. COMPANY
>
> I would want the group_name to be only "A.I.F." (the part of the string
> they both share). The identifying string does not occur in the same
> position across the names.
>
> Any help is greatly appreciated.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index