Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: nearmrg for strings (titles)


From   "Hoecher, Michaela (0613xxx)" <michaela.hoecher@edu.uni-graz.at>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   AW: st: nearmrg for strings (titles)
Date   Tue, 30 Aug 2011 20:57:03 +0200

Hello,

thanks a lot for your response. I tried to test it but I think I didn't understand how to use it (I'm a beginner).

This is my example:

sample.raw
-----------------------------------
        1  "manuela Hech"
        2  "Chris Mueller"
        3  "Fanzisa Haller "
        4  "Ulrike Loerr"
-----------------------------------

universe.raw
----------------------------------
        1  "manuela Hecher"
        2  "Christian Mueller"
        3  "Fanzisa Haller "
        4  "Ulrike Loerr"
---------------------------------


When I execute imatch.exe, it doesn't create the expected merge.txt but it creates 4 empty files:
canons.txt
fort.33
fort.34
fort.35
merge.raw


the code I get:

-----------------------------------
           1            32
           2            32
           3            32
           4            32
           5            32
           6            32
           7            32
           8            32
           9 1          49
          10            32
          11            32
          12 "          34
          13 m         109
          14 a          97
          15 n         110
          16 u         117
          17 e         101
          18 l         108
          19 a          97
          20            32
          21 H          72
          22 e         101
          23 c          99
          24 h         104
          25 e         101
          26 r         114
          27 "          34
          13
          29
          10
           1 1
          30            32
          31            32
          32            32
          33            32
          34            32
          35            32
          36            32
          37            32
          38 2          50
          39            32
          40            32
          41 "          34
          42 C          67
          43 h         104
          44 r         114
          45 i         105
          46 s         115
          47 t         116
          48 i         105
          49 a          97
          50 n         110
          51            32
          52 M          77
          53 u         117
          54 e         101
          55 l         108
          56 l         108
          57 e         101
          58 r         114
          59 "          34
          13
          61
          10
           2 2
          62            32
          63            32
          64            32
          65            32
          66            32
          67            32
          68            32
          69            32
          70 3          51
          71            32
          72            32
          73 "          34
          74 F          70
          75 a          97
          76 n         110
          77 z         122
          78 i         105
          79 s         115
          80 a          97
          81            32
          82 H          72
          83 a          97
          84 l         108
          85 l         108
          86 e         101
          87 r         114
          88            32
          89 "          34
          13
          91
          10
           3 3
          92            32
          93            32
          94            32
          95            32
          96            32
          97            32
          98            32
          99            32
         100 4          52
         101            32
         102            32
         103 "          34
         104 U          85
         105 l         108
         106 r         114
         107 i         105
         108 k         107
         109 e         101
         110            32
         111 L          76
         112 o         111
         113 e         101
         114 r         114
         115 r         114
         116 "          34
         117            -1

       3 records in universe.raw
       9 words
       6 unique words

Enter number of best match and return.
Enter an empty line for no match.

           1            32
           2            32
           3            32
           4            32
           5            32
           6            32
           7            32
           8            32
           9 1          49
          10            32
          11            32
          12 "          34
          13 m         109
          14 a          97
          15 n         110
          16 u         117
          17 e         101
          18 l         108
          19 a          97
          20            32
          21 H          72
          22 e         101
          23 c          99
          24 h         104
          25 "          34
          13
          27
          10
  manuela          2.19

 *  "manuela Hech"


 1. "manuela Hecher"

0-1:>

-----------------------------------


Thanks, Michaela

________________________________________
Von: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] im Auftrag von Daniel Feenberg [feenberg@nber.org]
Gesendet: Dienstag, 30. August 2011 14:13
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: nearmrg for strings (titles)

On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:

> Hello!
>
> I would like to merge two datasets (variables: title, date, publisher).
> The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
> - Does it make sense to use nearmrg for this?
> - In which way are strings merged/matched?
> - What would you recommend me?

Some time ago I wrote a program to help a clerical do this rapidly. The
program finds up to 5 likely matches, and lets the operator select the
best match. I used it once to go through a few thousand journal article
matches but it hasn't been used since. There is documentation at:

   http://www.nber.org/imatch

and I would be interested in having a few more users. It is interactive,
but it isn't a GUI program - it runs from the command line and the
operator makes selections with the keyboard.

Note that most commercial code to do matching is oriented towards
address matching, and won't be particularly adept at author/title
matching.

Dan Feenberg


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index