Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: identifying strings that differ on one or two letters
From 
 
Nick Cox <[email protected]> 
To 
 
"'[email protected]'" <[email protected]> 
Subject 
 
RE: st: identifying strings that differ on one or two letters 
Date 
 
Fri, 19 Nov 2010 16:16:12 +0000 
On -strgroup-, the pertiment information appears to be within the help file:
"strgroup is implemented as a plugin in order to minimize memory requirements and to maximize speed.  Unfortunately, plugins are specific to the hardware
    architecture and software framework of your computer, i.e., plugins are not cross-platform.  Define a platform by two characteristics: machine type and operating
    system.  Stata stores these characteristics in c(machine_type) and c(os), respectively. strgroup supports the following platforms at this time:
         Machine type                   Operating system
         PC                             Windows
         PC (64-bit x86-64)             Unix
         Macintosh                      MacOSX
         Macintosh (Intel 64-bit)       MacOSX"
The message appears to imply that your platform is not supported. 
On -soundex()- evidently that function classifies more coarsely than you need. 
These string matching problems are very difficult to automate in the sense of replicating what a knowledgeable human would do. 
Nick 
[email protected] 
Dalhia
I tried both techniques suggested by the list (thank you Dmitry and Scott). But neither seem to work, and I am hoping you can tell me what is wrong. 
I can't seem to load "strgroup." When I try to install it on stata 11, it gives me the following message:
"package does not contain strgroup.plugin for WIN64A platform could not load strgroup.pkg from http://fmwww.bc.edu/RePEc/bocode/s/"
I'm sure there is a simple fix, but my stata code knowledge is very basic, and I'm not sure how to fix this problem. 
I also tried Soundex, but it identifies completely different companies as the same. For example, suniti commercials ltd, sunnytex investments pvt ltd, sunteck realty & infrastructure ltd, syndicate bank, all get the same soundex code S532. And soundex does not seem to allow any options that might limit matches to names that are very similar. 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/