[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: Regexr Stata

From	"Sergiy Radyakin" <[email protected]>
To	[email protected]
Subject	Re: st: RE: RE: Regexr Stata
Date	Fri, 26 Sep 2008 11:23:13 -0400

On Fri, Sep 26, 2008 at 9:08 AM, Diana Eastman <[email protected]> wrote:
>
> Thank you for the responses. This is incredibly helpful.
>
> -----Original Message-----
> From: [email protected] on behalf of Matt Spittal
> Sent: Thu 9/25/2008 10:18 PM
> To: [email protected]
> Subject: st: RE: Regexr Stata
>
> Diana,
>
> One way of searching a string for a certain match (regardless of its position) is to do the following
>
>        generate str15 grp = "A.I.F" if regexm(name, "A.I.F")

the same can alse be achieved with a simple strpos() function:
         generate str15 grp = "A.I.F" if strpos(name, "A.I.F")
since regular expression is hardly used in the above.

Regards, Sergiy Radyakin

>
> This will create a string variable called 'grp' which will equal A.I.F if A.I.F appeared anywhere within the variable 'name'. This works because -regexm(name, "A.I.F")- returns 1 if the statement if true and 0 if it is false.  So Stata will create a variable called 'grp' and assign it the value A.I.F if the statement is true and missing if it is not.
>
> As an extension to this, if your data looks like this
>
>        A.I.F. GMBH
>        A.I.F. COMPANY
>        QANTAS
>        AIR NEW ZEALAND
>
> and you also want to identify, say, QANTAS flights, then you can add this line to your code
>
>        replace grp = "QANTAS" if regexm(name, "QANTAS")
>
> You'll have to be careful if your data looks something like this
>
>        A.I.F. GMBH
>        A.I.F. COMPANY
>        QANTAS A.I.F
>        AIR NEW ZEALAND
>
> because the value A.I.F, which was created with the -generate- statement, will be replaced with QANTAS in the -replace- statement.
>
> -- Matt
> [email protected]
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Diana Eastman
> Sent: Friday, 26 September 2008 7:36 AM
> To: [email protected]
> Subject: st: Regexr Stata
>
>
> Hi all,
>
> I have a variable called "name" which lists several different airlines.
> I need to write some code that will identify a regular expression within
> these names and assign them a value in the new variable "group_name"
>
> For instance, for the two names:
>
> A.I.F. GMBH
> A.I.F. COMPANY
>
> I would want the group_name to be only "A.I.F." (the part of the string
> they both share). The identifying string does not occur in the same
> position across the names.
>
> Any help is greatly appreciated.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: RE: Regexr Stata
  - From: "Matt Spittal" <[email protected]>
- st: RE: RE: Regexr Stata
  - From: Diana Eastman <[email protected]>

Prev by Date: st: Fraud methods in Stata
Next by Date: Re: st: R: Estimating the probability of censoring
Previous by thread: st: RE: RE: Regexr Stata
Next by thread: st: RE: Regexr Stata
Index(es):
- Date
- Thread