Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: nearmrg for strings (titles)


From   Daniel Feenberg <feenberg@nber.org>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: nearmrg for strings (titles)
Date   Tue, 30 Aug 2011 08:13:11 -0400 (EDT)



On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:

Hello!

I would like to merge two datasets (variables: title, date, publisher).
The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
- Does it make sense to use nearmrg for this?
- In which way are strings merged/matched?
- What would you recommend me?

Some time ago I wrote a program to help a clerical do this rapidly. The program finds up to 5 likely matches, and lets the operator select the best match. I used it once to go through a few thousand journal article matches but it hasn't been used since. There is documentation at:

  http://www.nber.org/imatch

and I would be interested in having a few more users. It is interactive, but it isn't a GUI program - it runs from the command line and the operator makes selections with the keyboard.

Note that most commercial code to do matching is oriented towards
address matching, and won't be particularly adept at author/title
matching.

Dan Feenberg


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index