Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: identifying strings that differ on one or two letters


From   Dalhia <[email protected]>
To   [email protected]
Subject   Re: st: identifying strings that differ on one or two letters
Date   Fri, 19 Nov 2010 08:05:10 -0800 (PST)

Hi,
I tried both techniques suggested by the list (thank you Dmitry and Scott). But neither seem to work, and I am hoping you can tell me what is wrong. 

I can't seem to load "strgroup." When I try to install it on stata 11, it gives me the following message:

"package does not contain strgroup.plugin for WIN64A platform could not load strgroup.pkg from http://fmwww.bc.edu/RePEc/bocode/s/";

I'm sure there is a simple fix, but my stata code knowledge is very basic, and I'm not sure how to fix this problem. 

I also tried Soundex, but it identifies completely different companies as the same. For example, suniti commercials ltd, sunnytex investments pvt ltd, sunteck realty & infrastructure ltd, syndicate bank, all get the same soundex code S532. And soundex does not seem to allow any options that might limit matches to names that are very similar. 

Any suggestions? 

Thanks. I appreciate your help.
dalhia



--- On Fri, 11/19/10, Scott Merryman <[email protected]> wrote:

> From: Scott Merryman <[email protected]>
> Subject: Re: st: identifying strings that differ on one or two letters
> To: [email protected]
> Date: Friday, November 19, 2010, 4:55 PM
> You might try -soundex()-
> 
> . clear
> 
> . set obs 3
> obs was 0, now 3
> 
> . generate str var1 = "Jayanthi chemicals" in 1
> (2 missing values generated)
> 
> . replace var1 = "Jayanth chemicals" in 2
> (1 real change made)
> 
> . replace var1 = "Jay chemicals" in 3
> (1 real change made)
> 
> . gen soundex = soundex(var1)
> 
> . cl
> 
>                
>    var1    soundex
>   1. Jayanthi chemicals   
>    J532
>   2.  Jayanth chemicals   
>    J532
>   3.      Jay chemicals   
>    J252
> 
> 
> 
> On Fri, Nov 19, 2010 at 6:59 AM, Dalhia <[email protected]>
> wrote:
> > Hello,
> >
> > Is there a method in stata to identify strings that
> differ by just one or two letters?
> > For example:
> >
> > comp_name
> >
> > Jayanthi chemicals
> > Jayanth chemicals
> > Jay chemicals
> >
> > So here the first two should be identified since they
> differ by only one letter, but not the last one since it
> differs by 4 letters? Is there a way to do this in stata?
> >
> > thanks. I appreciate your help.
> > dalhia
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index