Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: finding duplicate data


From   Mahbubeh Parsaeian <mahbobehparsaeian@yahoo.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: finding duplicate data
Date   Sat, 17 Nov 2012 10:08:59 -0800 (PST)

Hi everybody.
I have problem about finding duplicate data.
I work with a dataset which consist of household information. For every
household, interviewers should ask only from one of the family members. Unfortunately
in some family the interviewers asked from two or more person in a family and I
should delete this extra part of data . 
To clarify the problem, imagine the id number shows the household number.
As an example the clusters consist of an id number such as 1, 2,3. I have
explored the data and understand some id numbers have been repeated in the same
cluster (for example 1 2 2) and it shows two or more person have been interviewed in the same
family. I want to use one person in a family and delete the data for the second
person.
My personal idea is to use a function to enumerate the id variable
within the clusters and delete extra id numbers which have been repeated. I know
we have some function like _n which enumerate the repetition but I don’t know
how I use this command to enumerate the repetition of id variable within
clusters. 
If it is possible please help me to find a good solution.  

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index