Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Daniel Feenberg <feenberg@nber.org> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: AW: st: nearmrg for strings (titles) |

Date |
Fri, 2 Sep 2011 09:28:26 -0400 (EDT) |

dan On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:

Hello, thanks a lot for your response. I tried to test it but I think I didn't understand how to use it (I'm a beginner). This is my example: sample.raw ----------------------------------- 1 "manuela Hech" 2 "Chris Mueller" 3 "Fanzisa Haller " 4 "Ulrike Loerr" ----------------------------------- universe.raw ---------------------------------- 1 "manuela Hecher" 2 "Christian Mueller" 3 "Fanzisa Haller " 4 "Ulrike Loerr" --------------------------------- When I execute imatch.exe, it doesn't create the expected merge.txt but it creates 4 empty files: canons.txt fort.33 fort.34 fort.35 merge.raw the code I get: ----------------------------------- 1 32 2 32 3 32 4 32 5 32 6 32 7 32 8 32 9 1 49 10 32 11 32 12 " 34 13 m 109 14 a 97 15 n 110 16 u 117 17 e 101 18 l 108 19 a 97 20 32 21 H 72 22 e 101 23 c 99 24 h 104 25 e 101 26 r 114 27 " 34 13 29 10 1 1 30 32 31 32 32 32 33 32 34 32 35 32 36 32 37 32 38 2 50 39 32 40 32 41 " 34 42 C 67 43 h 104 44 r 114 45 i 105 46 s 115 47 t 116 48 i 105 49 a 97 50 n 110 51 32 52 M 77 53 u 117 54 e 101 55 l 108 56 l 108 57 e 101 58 r 114 59 " 34 13 61 10 2 2 62 32 63 32 64 32 65 32 66 32 67 32 68 32 69 32 70 3 51 71 32 72 32 73 " 34 74 F 70 75 a 97 76 n 110 77 z 122 78 i 105 79 s 115 80 a 97 81 32 82 H 72 83 a 97 84 l 108 85 l 108 86 e 101 87 r 114 88 32 89 " 34 13 91 10 3 3 92 32 93 32 94 32 95 32 96 32 97 32 98 32 99 32 100 4 52 101 32 102 32 103 " 34 104 U 85 105 l 108 106 r 114 107 i 105 108 k 107 109 e 101 110 32 111 L 76 112 o 111 113 e 101 114 r 114 115 r 114 116 " 34 117 -1 3 records in universe.raw 9 words 6 unique words Enter number of best match and return. Enter an empty line for no match. 1 32 2 32 3 32 4 32 5 32 6 32 7 32 8 32 9 1 49 10 32 11 32 12 " 34 13 m 109 14 a 97 15 n 110 16 u 117 17 e 101 18 l 108 19 a 97 20 32 21 H 72 22 e 101 23 c 99 24 h 104 25 " 34 13 27 10 manuela 2.19 * "manuela Hech" 1. "manuela Hecher" 0-1:> ----------------------------------- Thanks, Michaela ________________________________________ Von: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] im Auftrag von Daniel Feenberg [feenberg@nber.org] Gesendet: Dienstag, 30. August 2011 14:13 An: statalist@hsphsun2.harvard.edu Betreff: Re: st: nearmrg for strings (titles) On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:Hello! I would like to merge two datasets (variables: title, date, publisher). The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched. - Does it make sense to use nearmrg for this? - In which way are strings merged/matched? - What would you recommend me?Some time ago I wrote a program to help a clerical do this rapidly. The program finds up to 5 likely matches, and lets the operator select the best match. I used it once to go through a few thousand journal article matches but it hasn't been used since. There is documentation at: http://www.nber.org/imatch and I would be interested in having a few more users. It is interactive, but it isn't a GUI program - it runs from the command line and the operator makes selections with the keyboard. Note that most commercial code to do matching is oriented towards address matching, and won't be particularly adept at author/title matching. Dan Feenberg * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Package -ghansen- now available in SSC** - Next by Date:
**st: Re: Question about ln-linear models** - Previous by thread:
**st: Using cdeco with frequency weights** - Next by thread:
**st: Re: Question about ln-linear models** - Index(es):