[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Question about match merge

From   "Michael Blasnik" <[email protected]>
To   <[email protected]>
Subject   st: Re: Question about match merge
Date   Wed, 05 Sep 2007 08:21:14 -0400

This was the topic of my talk at NASUG last month. I have not yet submitted my presentation and the reclink.ado file, but they are forthcoming. reclink.ado does a fairly basic probabilistic record linkage, which is what you describe. The current version allows for matching and non-matching weights for each variable and does fuzzy string matching (using a bigram), but does not provide for observation-specific weighting. I have been considering approaches for adding that feature. The ado should be available within the next week or two.

Michael Blasnik

----- Original Message ----- From: "Scott Talkington" <[email protected]>
To: <[email protected]>
Sent: Sunday, September 02, 2007 9:39 AM
Subject: st: Question about match merge

I seem to recall that there's an algorithm that is able to crosswalk
databases by matching names combined with other secondary keys, such as
zip code, and that the algorithm will produce a "probability of match"
for the given ID.  I used to conduct match merges based on name and zip
in an earlier version of Stata, but it was quite cumbersome to deal with
misspellings, typos (common transpositions of letters or numbers, etc.),
all caps vs lower case, prefixes and suffixes, titles, middle initial
versus middle name, etc, etc..  What I'd like to know is whether a more
sophisticated match/merge based on primary and secondary keys or IDs has
been developed, and if so some documentation on how it works.  Also,
would it deal with very common names, such as "David Jones" vs less
common names, like "Horace Vilochkek" or size of the database,  adjust
the probability of match accordingly.  Or is all of this just some pipe
dream I happend to think up when I was under the influence?

I'll also try to scrounge up something on the FAQ database, but most of
my text documentation on Stata 9.2 is stored in boxes since I'm in the
midst of a move, and I need at least some idea of the capability of such
a match/merge within the week.

Scott Talkington, PhD
[email protected]
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index