Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Reclink: high matching score, but no match


From   Devra Golbe <[email protected]>
To   [email protected]
Subject   Re: st: Reclink: high matching score, but no match
Date   Fri, 24 Jan 2014 17:45:45 -0500

Dorothy and all,

My apologies for not closing this thread properly. Michael solved my problem in private correspondence, and I failed to report back to the list.

The problem was that in successive runs of my do-file I had managed to save the idusing variable to the master dataset. Once I dropped it from the master file reclink ran fine.

Devra

On 1/24/2014 1:43 PM Dorothy Bridges wrote:
Hello everyone (especially Devra and Michael): Was this ever resolved?
I'm having the exact same problem. Code and (partial) output copied
below. I usually use reclink without any problems.

reclink entidad municipio using
"ESDATA/dta/MunicipioLevel/ESDATAMun_Jan2014.dta", ///

         idu(idu) idm(idm) gen(match) required(entidad)

listing the output:

entidad                    municipio
Umunicipio              match
                ZULIA               VALMORE RODRIGUEZ          SAN
FRANCISCO   0.9961
        NUEVA ESPARTA               ANTOLIN DEL CAMPO
SOTILLO   0.9961
              YARACUY               JOSE ANTONIO PAEZ
BRUZUAL   0.9961
                ZULIA           LA CANADA DE URDANETA       FRANCISCO J
PULG   0.9974
               ARAGUA          MARIO BRICENO IRAGORRY   M.OCUMARE D LA
COSTA   0.9977
                ZULIA               ROSARIO DE PERIJA
MARA   0.9982
               ARAGUA     FRANCISCO LINARES ALCANTARA   FRANCISCO
LINARES A.   0.9992
              BARINAS        ALBERTO ARVELO TORREALBA
ZAMORA   0.9995
                SUCRE                          RIBERO
MEJIA   1.0000
             CARABOBO                          BEJUMA
SIFONTES   1.0000
              MIRANDA                        URDANETA           RIVAS
DAVILA   1.0000
              YARACUY                    MANUEL MONGE
INDEPENDENCIA   1.0000
        DELTA AMACURO                       CASACOIMA
TINACO   1.0000
                SUCRE                           SUCRE
MONTES   1.0000
        NUEVA ESPARTA                           GOMEZ
GARCIA   1.0000
               ARAGUA                JOSE ANGEL LAMAS       JOSE ANGEL
LAMAS   1.0000
              BARINAS                    CRUZ PAREDES
BARINAS   1.0000
              MIRANDA                           SUCRE
RANGEL   1.0000
              MIRANDA                   SIMON BOLIVAR           PUEBLO
LLANO   1.0000
                ZULIA                            PAEZ         MACHIQUES
DE P   1.0000

On Wed, Dec 28, 2011 at 10:04 AM, Devra Golbe <[email protected]> wrote:
Michael,

student_name is non-numeric.  After some additional data cleaning and the
resulting reduction of the set that needed a fuzzy match  reclink succeeded
with student_name as the idusing variable, so my original problem is solved.

But working with a smaller data set, I have an example where the non-numeric
identifier and a numeric identifier fail, but a different numeric identifier
succeeds.  I'll send those data and the do-file to you off-list.

Thanks and happy new year.

Devra


On 12/28/2011 11:49 AM Michael Blasnik wrote:
It looks like this is a bug -- is student_name numeric?  If not, you
may want to try encoding it and trying again.  If that isn't the
problem, it might be best if you either send me the data or a trace
log off-list to see if i can figure it out, but I may not get a chance
to figure it out until after the holidays.

Michael

On Sat, Dec 24, 2011 at 4:10 PM, Devra Golbe<[email protected]>  wrote:
I am using  Michael Blasnik's reclink (from SSC) to match records.  I get
extremely high matching scores, and yet the records do not match.  Can
anyone help?    My code and relevant output are pasted below.

Thanks and happy holidays,
Devra

******
. sort lname fname
   . gen idmaster=_n
   .tempfile ps1a
   .save `ps1a', replace
   . clear
   .use roster100f11Sep7.dta
   .sort lname fname
   .save, replace
   .clear
   .use `ps1a'

   .reclink lname fname using roster100f11Sep7.dta, ///
    idmaster(idmaster) idusing(student_name) gen(link)

0 perfect matches found


Added: student_name= identifier from roster100f11Sep7.dta   link =
matching
score
Observations:  Master N = 26    roster100f11Sep7.dta N= 182
   Unique Master Cases: matched = 0 (exact = 0), unmatched = 26

.list link _merge in 1/5, clean

          link   _merge
   1.   0.9933        1
   2.   0.9933        1
   3.        .        1
   4.   0.6420        1
   5.   0.9988        1

_______
Devra Golbe
Professor of Economics
Hunter College, CUNY
NY, NY
*
*

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index