Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: nearmrg for strings (titles)

From	Daniel Feenberg <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: nearmrg for strings (titles)
Date	Tue, 30 Aug 2011 08:13:11 -0400 (EDT)



On Tue, 30 Aug 2011, Hoecher, Michaela (0613xxx) wrote:

Hello!

I would like to merge two datasets (variables: title, date, publisher).
The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
- Does it make sense to use nearmrg for this?
- In which way are strings merged/matched?
- What would you recommend me?

Some time ago I wrote a program to help a clerical do this rapidly. Theprogram finds up to 5 likely matches, and lets the operator select thebest match. I used it once to go through a few thousand journal articlematches but it hasn't been used since. There is documentation at:


  http://www.nber.org/imatch

and I would be interested in having a few more users. It is interactive,but it isn't a GUI program - it runs from the command line and theoperator makes selections with the keyboard.


Note that most commercial code to do matching is oriented towards
address matching, and won't be particularly adept at author/title
matching.

Dan Feenberg


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- AW: st: nearmrg for strings (titles)
  - From: "Hoecher, Michaela (0613xxx)" <[email protected]>

References:
- st: nearmrg for strings (titles)
  - From: "Hoecher, Michaela (0613xxx)" <[email protected]>

Prev by Date: st: variable window in Stata will not show label
Next by Date: RE: st: Transposing dotplot (changing axes)
Previous by thread: Re: st: nearmrg for strings (titles)
Next by thread: AW: st: nearmrg for strings (titles)
Index(es):
- Date
- Thread