Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "sounds like" function in Stata


From   Tirthankar Chakravarty <[email protected]>
To   [email protected]
Subject   Re: st: "sounds like" function in Stata
Date   Tue, 18 Aug 2009 20:01:47 +0100

<>
ssc install _gsoundex

due to Michael Blasnik. Or, in Stata 11:
http://www.stata.com/help.cgi?string+functions

T

On Tue, Aug 18, 2009 at 7:55 PM, Dalhia<[email protected]> wrote:
> Hi, Is there some kind of "sounds like" function in Stata? I have a list of companies but the names are sometimes a little different. Example AOL Time Warner also appears as AOL, Time Warner, and Time Warner Inc. I need a method to figure out that all these are the same entity, and none of the string functions in Stata seem to do what I want. Do any of you have any suggestions. Here is how the data looks like:
>
> Name
>
> AOL
> AOL Time Warner
> Time Warner Inc
> Microsoft
> Microsoft Inc
> Microsft
>
> Ideally, what I would like is some way to indicate which names are similar. For example:
>
> Name, Similarity
>
> AOL, 1
> AOL Time Warner, 1
> Time Warner Inc, 1
> Microsoft, 2
> Microsoft Inc, 2
> Microsft, 2
>
> Any help will be much appreciated.
> Thanks
> Dalhia
>
> --- On Fri, 6/5/09, Nick Cox <[email protected]> wrote:
>
>> From: Nick Cox <[email protected]>
>> Subject: st: RE: appling string functions across observations
>> To: [email protected]
>> Date: Friday, June 5, 2009, 3:00 PM
>> Check out -fndmtch2- or -fndmtch-
>> from SSC. At first sight they don't
>> address this problem, but there are at least two ways
>> forward:
>>
>> It sounds as if you have surnames and full names (or the
>> equivalent in
>> your area). -split- the fullnames and work with the
>> separate variables.
>>
>> Clone one of the programs above but modify the code to look
>> for string
>> inclusion rather than strict equality.
>>
>> Nick
>> [email protected]
>>
>>
>> Dalhia
>>
>> I have a list of two variables: name1 and name2.  I
>> need to check if
>> name2 occurs in any of the name1s. The regexm command in
>> stata is
>> perfect for what I want to do, but it checks only one
>> string at a time,
>> and I need it to somehow rotate over a whole list of
>> names.
>>
>> Here is what I  have:
>>
>> name1
>> ram solanki
>> goel mehta
>> ashish gupta
>>
>> name2
>> solanki
>> mehta
>>
>> I need to be able to figure out that "solanki" and "mehta"
>> in name2
>> occur in name1 observation1 and observation2.
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index