[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "sounds like" function in Stata

From   Thomas Speidel <>
Subject   Re: st: "sounds like" function in Stata
Date   Tue, 18 Aug 2009 13:07:32 -0600

I believe Stata 11 has now a built-in function for this:

Quoting Tirthankar Chakravarty <> Tue 18 Aug 13:01:47 2009:

ssc install _gsoundex

due to Michael Blasnik. Or, in Stata 11:


On Tue, Aug 18, 2009 at 7:55 PM, Dalhia<> wrote:
Hi, Is there some kind of "sounds like" function in Stata? I have a list of companies but the names are sometimes a little different. Example AOL Time Warner also appears as AOL, Time Warner, and Time Warner Inc. I need a method to figure out that all these are the same entity, and none of the string functions in Stata seem to do what I want. Do any of you have any suggestions. Here is how the data looks like:


AOL Time Warner
Time Warner Inc
Microsoft Inc

Ideally, what I would like is some way to indicate which names are similar. For example:

Name, Similarity

AOL, 1
AOL Time Warner, 1
Time Warner Inc, 1
Microsoft, 2
Microsoft Inc, 2
Microsft, 2

Any help will be much appreciated.

--- On Fri, 6/5/09, Nick Cox <> wrote:

From: Nick Cox <>
Subject: st: RE: appling string functions across observations
Date: Friday, June 5, 2009, 3:00 PM
Check out -fndmtch2- or -fndmtch-
from SSC. At first sight they don't
address this problem, but there are at least two ways

It sounds as if you have surnames and full names (or the
equivalent in
your area). -split- the fullnames and work with the
separate variables.

Clone one of the programs above but modify the code to look
for string
inclusion rather than strict equality.



I have a list of two variables: name1 and name2.  I
need to check if
name2 occurs in any of the name1s. The regexm command in
stata is
perfect for what I want to do, but it checks only one
string at a time,
and I need it to somehow rotate over a whole list of

Here is what I  have:

ram solanki
goel mehta
ashish gupta


I need to be able to figure out that "solanki" and "mehta"
in name2
occur in name1 observation1 and observation2.

*   For searches and help try:

*   For searches and help try:

To every ?-consistent recursive class ? of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(?) (where v is the free variable of r).

*   For searches and help try:

Thomas Speidel

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index