Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: nearmrg for strings (titles)

From	"Hoecher, Michaela (0613xxx)" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	AW: st: nearmrg for strings (titles)
Date	Tue, 30 Aug 2011 20:57:03 +0200

Hello,

thanks a lot for your response. I tried to test it but I think I didn't understand how to use it (I'm a beginner).

This is my example:

sample.raw
-----------------------------------
        1  "manuela Hech"
        2  "Chris Mueller"
        3  "Fanzisa Haller "
        4  "Ulrike Loerr"
-----------------------------------

universe.raw
----------------------------------
        1  "manuela Hecher"
        2  "Christian Mueller"
        3  "Fanzisa Haller "
        4  "Ulrike Loerr"
---------------------------------


When I execute imatch.exe, it doesn't create the expected merge.txt but it creates 4 empty files:
canons.txt
fort.33
fort.34
fort.35
merge.raw


the code I get:

-----------------------------------
           1            32
           2            32
           3            32
           4            32
           5            32
           6            32
           7            32
           8            32
           9 1          49
          10            32
          11            32
          12 "          34
          13 m         109
          14 a          97
          15 n         110
          16 u         117
          17 e         101
          18 l         108
          19 a          97
          20            32
          21 H          72
          22 e         101
          23 c          99
          24 h         104
          25 e         101
          26 r         114
          27 "          34
          13
          29
          10
           1 1
          30            32
          31            32
          32            32
          33            32
          34            32
          35            32
          36            32
          37            32
          38 2          50
          39            32
          40            32
          41 "          34
          42 C          67
          43 h         104
          44 r         114
          45 i         105
          46 s         115
          47 t         116
          48 i         105
          49 a          97
          50 n         110
          51            32
          52 M          77
          53 u         117
          54 e         101
          55 l         108
          56 l         108
          57 e         101
          58 r         114
          59 "          34
          13
          61
          10
           2 2
          62            32
          63            32
          64            32
          65            32
          66            32
          67            32
          68            32
          69            32
          70 3          51
          71            32
          72            32
          73 "          34
          74 F          70
          75 a          97
          76 n         110
          77 z         122
          78 i         105
          79 s         115
          80 a          97
          81            32
          82 H          72
          83 a          97
          84 l         108
          85 l         108
          86 e         101
          87 r         114
          88            32
          89 "          34
          13
          91
          10
           3 3
          92            32
          93            32
          94            32
          95            32
          96            32
          97            32
          98            32
          99            32
         100 4          52
         101            32
         102            32
         103 "          34
         104 U          85
         105 l         108
         106 r         114
         107 i         105
         108 k         107
         109 e         101
         110            32
         111 L          76
         112 o         111
         113 e         101
         114 r         114
         115 r         114
         116 "          34
         117            -1

       3 records in universe.raw
       9 words
       6 unique words

Enter number of best match and return.
Enter an empty line for no match.

           1            32
           2            32
           3            32
           4            32
           5            32
           6            32
           7            32
           8            32
           9 1          49
          10            32
          11            32
          12 "          34
          13 m         109
          14 a          97
          15 n         110
          16 u         117
          17 e         101
          18 l         108
          19 a          97
          20            32
          21 H          72
          22 e         101
          23 c          99
          24 h         104
          25 "          34
          13
          27
          10
  manuela          2.19

 *  "manuela Hech"


 1. "manuela Hecher"

0-1:>

-----------------------------------


Thanks, Michaela

________________________________________
Von: [email protected] [[email protected]] im Auftrag von Daniel Feenberg [[email protected]]
Gesendet: Dienstag, 30. August 2011 14:13
An: [email protected]
Betreff: Re: st: nearmrg for strings (titles)

On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:

> Hello!
>
> I would like to merge two datasets (variables: title, date, publisher).
> The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
> - Does it make sense to use nearmrg for this?
> - In which way are strings merged/matched?
> - What would you recommend me?

Some time ago I wrote a program to help a clerical do this rapidly. The
program finds up to 5 likely matches, and lets the operator select the
best match. I used it once to go through a few thousand journal article
matches but it hasn't been used since. There is documentation at:

   http://www.nber.org/imatch

and I would be interested in having a few more users. It is interactive,
but it isn't a GUI program - it runs from the command line and the
operator makes selections with the keyboard.

Note that most commercial code to do matching is oriented towards
address matching, and won't be particularly adept at author/title
matching.

Dan Feenberg


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: nearmrg for strings (titles)
  - From: "Hoecher, Michaela (0613xxx)" <[email protected]>
- Re: st: nearmrg for strings (titles)
  - From: Daniel Feenberg <[email protected]>

Prev by Date: Re: st: Fit chi2 as in gammafit
Next by Date: Re: st: sampsi and percentages
Previous by thread: Re: st: nearmrg for strings (titles)
Next by thread: st: nearmrg for strings (titles)
Index(es):
- Date
- Thread