Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: identifying duplicate records


From   raoul reulen <r.c.reulen@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: identifying duplicate records
Date   Fri, 10 Feb 2012 14:08:04 +0000

Hello

Just wondering if I could get some advice. I have a large database
with around 300,000 records of individuals. There can be more than one
record per individual.  Now, how do I identify individuals? I assume
that it is the same indivual if:

Date of birth and NHS number are the same  OR
date of birth and surname are the same OR
surname and NHS number are the same.

So there are various combinations possible. A date of birth could have
typos in it; but if the NHS number and the surname are the same then I
assume it is the same person. The NHS number can have typos, but if
the date of birth and the surname are the same I will assume it is the
same person.

 What is the best way to approach this?  I want to end up with an
id-number that identifies the individual.  Many thanks for your help.

Raoul
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index