Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | raoul reulen <r.c.reulen@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: identifying duplicate records |
Date | Fri, 10 Feb 2012 14:08:04 +0000 |
Hello Just wondering if I could get some advice. I have a large database with around 300,000 records of individuals. There can be more than one record per individual. Now, how do I identify individuals? I assume that it is the same indivual if: Date of birth and NHS number are the same OR date of birth and surname are the same OR surname and NHS number are the same. So there are various combinations possible. A date of birth could have typos in it; but if the NHS number and the surname are the same then I assume it is the same person. The NHS number can have typos, but if the date of birth and the surname are the same I will assume it is the same person. What is the best way to approach this? I want to end up with an id-number that identifies the individual. Many thanks for your help. Raoul * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/