Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: "sounds like" function in Stata


From   Dalhia <ggs_da@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: "sounds like" function in Stata
Date   Tue, 18 Aug 2009 11:55:45 -0700 (PDT)

Hi, Is there some kind of "sounds like" function in Stata? I have a list of companies but the names are sometimes a little different. Example AOL Time Warner also appears as AOL, Time Warner, and Time Warner Inc. I need a method to figure out that all these are the same entity, and none of the string functions in Stata seem to do what I want. Do any of you have any suggestions. Here is how the data looks like:

Name

AOL
AOL Time Warner
Time Warner Inc
Microsoft
Microsoft Inc
Microsft 

Ideally, what I would like is some way to indicate which names are similar. For example:

Name, Similarity

AOL, 1
AOL Time Warner, 1
Time Warner Inc, 1
Microsoft, 2
Microsoft Inc, 2
Microsft, 2

Any help will be much appreciated. 
Thanks
Dalhia

--- On Fri, 6/5/09, Nick Cox <n.j.cox@durham.ac.uk> wrote:

> From: Nick Cox <n.j.cox@durham.ac.uk>
> Subject: st: RE: appling string functions across observations
> To: statalist@hsphsun2.harvard.edu
> Date: Friday, June 5, 2009, 3:00 PM
> Check out -fndmtch2- or -fndmtch-
> from SSC. At first sight they don't
> address this problem, but there are at least two ways
> forward: 
> 
> It sounds as if you have surnames and full names (or the
> equivalent in
> your area). -split- the fullnames and work with the
> separate variables. 
> 
> Clone one of the programs above but modify the code to look
> for string
> inclusion rather than strict equality. 
> 
> Nick 
> n.j.cox@durham.ac.uk
> 
> 
> Dalhia
> 
> I have a list of two variables: name1 and name2.  I
> need to check if
> name2 occurs in any of the name1s. The regexm command in
> stata is
> perfect for what I want to do, but it checks only one
> string at a time,
> and I need it to somehow rotate over a whole list of
> names.  
> 
> Here is what I  have:  
> 
> name1  
> ram solanki 
> goel mehta
> ashish gupta
> 
> name2
> solanki
> mehta
> 
> I need to be able to figure out that "solanki" and "mehta"
> in name2
> occur in name1 observation1 and observation2.  
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index