Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "sounds like" function in Stata


From   Thomas Speidel <[email protected]>
To   [email protected]
Subject   Re: st: "sounds like" function in Stata
Date   Tue, 18 Aug 2009 13:07:32 -0600

I believe Stata 11 has now a built-in function for this:
http://www.stata.com/help.cgi?soundex

Quoting Tirthankar Chakravarty <[email protected]> Tue 18 Aug 13:01:47 2009:

<>
ssc install _gsoundex

due to Michael Blasnik. Or, in Stata 11:
http://www.stata.com/help.cgi?string+functions

T

On Tue, Aug 18, 2009 at 7:55 PM, Dalhia<[email protected]> wrote:
Hi, Is there some kind of "sounds like" function in Stata? I have a list of companies but the names are sometimes a little different. Example AOL Time Warner also appears as AOL, Time Warner, and Time Warner Inc. I need a method to figure out that all these are the same entity, and none of the string functions in Stata seem to do what I want. Do any of you have any suggestions. Here is how the data looks like:

Name

AOL
AOL Time Warner
Time Warner Inc
Microsoft
Microsoft Inc
Microsft

Ideally, what I would like is some way to indicate which names are similar. For example:

Name, Similarity

AOL, 1
AOL Time Warner, 1
Time Warner Inc, 1
Microsoft, 2
Microsoft Inc, 2
Microsft, 2

Any help will be much appreciated.
Thanks
Dalhia

--- On Fri, 6/5/09, Nick Cox <[email protected]> wrote:

From: Nick Cox <[email protected]>
Subject: st: RE: appling string functions across observations
To: [email protected]
Date: Friday, June 5, 2009, 3:00 PM
Check out -fndmtch2- or -fndmtch-
from SSC. At first sight they don't
address this problem, but there are at least two ways
forward:

It sounds as if you have surnames and full names (or the
equivalent in
your area). -split- the fullnames and work with the
separate variables.

Clone one of the programs above but modify the code to look
for string
inclusion rather than strict equality.

Nick
[email protected]


Dalhia

I have a list of two variables: name1 and name2.  I
need to check if
name2 occurs in any of the name1s. The regexm command in
stata is
perfect for what I want to do, but it checks only one
string at a time,
and I need it to somehow rotate over a whole list of
names.

Here is what I  have:

name1
ram solanki
goel mehta
ashish gupta

name2
solanki
mehta

I need to be able to figure out that "solanki" and "mehta"
in name2
occur in name1 observation1 and observation2.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/






*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/




--
To every ?-consistent recursive class ? of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(?) (where v is the free variable of r).

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/




--
Thomas Speidel


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index