Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: identifying duplicate records

From   raoul reulen <>
Subject   st: identifying duplicate records
Date   Fri, 10 Feb 2012 14:08:04 +0000


Just wondering if I could get some advice. I have a large database
with around 300,000 records of individuals. There can be more than one
record per individual.  Now, how do I identify individuals? I assume
that it is the same indivual if:

Date of birth and NHS number are the same  OR
date of birth and surname are the same OR
surname and NHS number are the same.

So there are various combinations possible. A date of birth could have
typos in it; but if the NHS number and the surname are the same then I
assume it is the same person. The NHS number can have typos, but if
the date of birth and the surname are the same I will assume it is the
same person.

 What is the best way to approach this?  I want to end up with an
id-number that identifies the individual.  Many thanks for your help.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index