On Fri, Sep 26, 2008 at 9:08 AM, Diana Eastman <deastman@gma-us.com> wrote: > > Thank you for the responses. This is incredibly helpful. > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu on behalf of Matt Spittal > Sent: Thu 9/25/2008 10:18 PM > To: statalist@hsphsun2.harvard.edu > Subject: st: RE: Regexr Stata > > Diana, > > One way of searching a string for a certain match (regardless of its position) is to do the following > > generate str15 grp = "A.I.F" if regexm(name, "A.I.F") the same can alse be achieved with a simple strpos() function: generate str15 grp = "A.I.F" if strpos(name, "A.I.F") since regular expression is hardly used in the above. Regards, Sergiy Radyakin > > This will create a string variable called 'grp' which will equal A.I.F if A.I.F appeared anywhere within the variable 'name'. This works because -regexm(name, "A.I.F")- returns 1 if the statement if true and 0 if it is false. So Stata will create a variable called 'grp' and assign it the value A.I.F if the statement is true and missing if it is not. > > As an extension to this, if your data looks like this > > A.I.F. GMBH > A.I.F. COMPANY > QANTAS > AIR NEW ZEALAND > > and you also want to identify, say, QANTAS flights, then you can add this line to your code > > replace grp = "QANTAS" if regexm(name, "QANTAS") > > You'll have to be careful if your data looks something like this > > A.I.F. GMBH > A.I.F. COMPANY > QANTAS A.I.F > AIR NEW ZEALAND > > because the value A.I.F, which was created with the -generate- statement, will be replaced with QANTAS in the -replace- statement. > > -- Matt > matt.spittal@cancervic.org.au > > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Diana Eastman > Sent: Friday, 26 September 2008 7:36 AM > To: statalist@hsphsun2.harvard.edu > Subject: st: Regexr Stata > > > Hi all, > > I have a variable called "name" which lists several different airlines. > I need to write some code that will identify a regular expression within > these names and assign them a value in the new variable "group_name" > > For instance, for the two names: > > A.I.F. GMBH > A.I.F. COMPANY > > I would want the group_name to be only "A.I.F." (the part of the string > they both share). The identifying string does not occur in the same > position across the names. > > Any help is greatly appreciated. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

